Help us keep the list up to date and submit new video software here.


  Search or Browse all software by sections


Home Software Home Video Encoders (H264/H265/MP4/MKV)

Tool Description Type Rating Comment


RECENTLY UPDATED
x264 Encoder is an open source H264/AVC based video encoder. The x264 CLI is a command line x264 encoder tool and is used in several converters like Handbrake, Xvid4PSP, StaxRip, RipBot264, FairUse Wizard, MEGUI.

Free software
Win Win Mac Mac Linux Linux

Version:r2491 History
Released:

Size:8.3MB

Portable version



9.2/10
16 votes


Guides
Similar tools
Read 16
comments

3942 views
this month
34465201
total views
  Latest version:
r2491 (November 13, 2014)


Download sites

:
Visit developer's site

Download x264 Encoder r2491 (8.3MB) (Portable)


More download options:
Download x264 Encoder r2491 from another mirror site

Download x264 Encoder Linux version





Supported operating systems:
Windows Mac OS Linux


More information and other downloads:
Download Komisar's unoffical x264 VFW Codec here or another an unoffical x264 VFW Codec here, use x264 in for example Virtualdub or other that supports Video For Windows(VFW) Codecs. Both encoding and decoding.

x264 Encoder GUIs/Frontends:
Handbrake, Xvid4PSP, StaxRip, RipBot264, FairUse Wizard, MEGUI, AutoX264, HDConvertToX.


Sections/Browse similar tools:
Linux video toolsMacOS video toolsVideo Encoders (H264/H265/MP4/MKV)

x264 Encoder screenshot
Click to enlarge screenshot

User options:
Email me when it has been updated    Report this tool (dead link/new version)  


Version history / Release notes / Changelog:
3 weeks agoSafety check against malicious high bit-depth input which could cause crash master
commit | commitdiff | tree
Anton Mitrofanov [Sun, 12 Oct 2014 18:01:53 +0100 (21:01 +0400)]
Safety check against malicious high bit-depth input which could cause crash

3 weeks agolibx264 API usage example
commit | commitdiff | tree
Anton Mitrofanov [Sun, 12 Oct 2014 17:45:40 +0100 (20:45 +0400)]
libx264 API usage example

3 weeks agox86: AVX2 high bit-depth var_16x16
commit | commitdiff | tree
Henrik Gramner [Fri, 17 Oct 2014 20:35:42 +0100 (21:35 +0200)]
x86: AVX2 high bit-depth var_16x16

40->27 cycles on Haswell.

5 weeks agocheckasm: Serialize read_time() calls on x86
commit | commitdiff | tree
Henrik Gramner [Wed, 8 Oct 2014 21:25:35 +0100 (22:25 +0200)]
checkasm: Serialize read_time() calls on x86

Improves the accuracy of benchmarks, especially in short functions.

To quote the Intel 64 and IA-32 Architectures Software Developer's Manual:
"The RDTSC instruction is not a serializing instruction. It does not necessarily
wait until all previous instructions have been executed before reading the counter.
Similarly, subsequent instructions may begin execution before the read operation
is performed. If software requires RDTSC to be executed only after all previous
instructions have completed locally, it can either use RDTSCP (if the processor
supports that instruction) or execute the sequence LFENCE;RDTSC."

RDTSCP would accomplish the same task, but it's only available since Nehalem.

This change makes SSE2 a requirement to run checkasm.

6 weeks agoSupport case-independent string options
commit | commitdiff | tree
Vittorio Giovara [Mon, 29 Sep 2014 18:51:30 +0100 (18:51 +0100)]
Support case-independent string options

8 weeks agoShut up gcc -Wuninitialized warnings
commit | commitdiff | tree
Anton Mitrofanov [Sat, 6 Sep 2014 17:44:49 +0100 (20:44 +0400)]
Shut up gcc -Wuninitialized warnings

8 weeks agoShut up clang -Wuninitialized warning
commit | commitdiff | tree
Anton Mitrofanov [Fri, 5 Sep 2014 16:43:52 +0100 (19:43 +0400)]
Shut up clang -Wuninitialized warning

8 weeks agoFix few clang -Wunused-* warnings
commit | commitdiff | tree
Anton Mitrofanov [Fri, 5 Sep 2014 16:30:47 +0100 (19:30 +0400)]
Fix few clang -Wunused-* warnings

8 weeks agoFix inappropriate instruction use
commit | commitdiff | tree
Anton Mitrofanov [Thu, 28 Aug 2014 17:13:13 +0100 (20:13 +0400)]
Fix inappropriate instruction use

8 weeks agox264asm: warn when inappropriate instruction used in function with specified cpuflags
commit | commitdiff | tree
Anton Mitrofanov [Thu, 28 Aug 2014 15:38:53 +0100 (18:38 +0400)]
x264asm: warn when inappropriate instruction used in function with specified cpuflags

2 months agoFix VBV with true VFR streams
commit | commitdiff | tree
Anton Mitrofanov [Mon, 1 Sep 2014 22:48:00 +0100 (01:48 +0400)]
Fix VBV with true VFR streams

2 months agoFix VBV
commit | commitdiff | tree
Anton Mitrofanov [Mon, 1 Sep 2014 19:45:00 +0100 (22:45 +0400)]
Fix VBV

3 hours agoUpdate to the current lavf API and fix memory leak when using --seek master
commit | commitdiff | tree
Anton Mitrofanov [Wed, 30 Jul 2014 01:03:32 +0200 (03:03 +0400)]
Update to the current lavf API and fix memory leak when using --seek

3 hours agox86inc: Make INIT_CPUFLAGS support an arbitrary number of cpuflags
commit | commitdiff | tree
Henrik Gramner [Tue, 5 Aug 2014 01:42:55 +0200 (01:42 +0200)]
x86inc: Make INIT_CPUFLAGS support an arbitrary number of cpuflags

Previously there was a limit of two cpuflags.

3 hours agox86: Minor pixel_ssim_end4 improvements
commit | commitdiff | tree
Henrik Gramner [Tue, 5 Aug 2014 01:42:51 +0200 (01:42 +0200)]
x86: Minor pixel_ssim_end4 improvements

Reduce the number of vector registers used from 7 to 5.
Eliminate some moves in the AVX implementation.
Avoid bypass delays for transitioning between int and float domains.

3 hours agox86: Faster quant_4x4x4
commit | commitdiff | tree
Henrik Gramner [Tue, 5 Aug 2014 01:42:47 +0200 (01:42 +0200)]
x86: Faster quant_4x4x4

Also drop the MMX version instead of doing a bunch of ifdeffery to support it after this change.

3 hours agoconfigure: improve cc_check for clang and ICL to not ignore unknown options
commit | commitdiff | tree
Anton Mitrofanov [Sun, 10 Aug 2014 20:46:12 +0200 (22:46 +0400)]
configure: improve cc_check for clang and ICL to not ignore unknown options

3 hours agocheckasm: Only call x264_cpu_detect() once
commit | commitdiff | tree
Henrik Gramner [Tue, 5 Aug 2014 01:42:44 +0200 (01:42 +0200)]
checkasm: Only call x264_cpu_detect() once

3 hours agoaarch64: deblocking NEON asm
commit | commitdiff | tree
Janne Grunau [Fri, 18 Jul 2014 15:49:10 +0200 (14:49 +0100)]
aarch64: deblocking NEON asm

Deblock chroma/luma are based on libav's h264 aarch64 NEON deblocking
filter which was ported by me from the existing ARM NEON asm. No
additional persons to ask for a relicense.

3 hours agoaarch64: intra predition NEON asm
commit | commitdiff | tree
Janne Grunau [Fri, 18 Jul 2014 10:29:35 +0200 (09:29 +0100)]
aarch64: intra predition NEON asm

Ported from the ARM NEON asm.

3 hours agoaarch64: motion compensation NEON asm
commit | commitdiff | tree
Janne Grunau [Thu, 17 Jul 2014 16:58:44 +0200 (15:58 +0100)]
aarch64: motion compensation NEON asm

Ported from the ARM NEON asm.

3 hours agoaarch64: transform and zigzag NEON asm
commit | commitdiff | tree
Janne Grunau [Wed, 16 Jul 2014 11:03:52 +0200 (10:03 +0100)]
aarch64: transform and zigzag NEON asm

Ported from the ARM NEON asm.

3 hours agoaarch64: quantization and level-run NEON asm
commit | commitdiff | tree
Janne Grunau [Tue, 15 Jul 2014 13:57:03 +0200 (12:57 +0100)]
aarch64: quantization and level-run NEON asm

Ported from the ARM NEON asm.

3 hours agoaarch64: pixel metrics NEON asm
commit | commitdiff | tree
Janne Grunau [Wed, 19 Mar 2014 14:48:21 +0200 (13:48 +0100)]
aarch64: pixel metrics NEON asm

Ported from the ARM NEON asm.

3 hours agoaarch64: add utility functions for asm
commit | commitdiff | tree
Janne Grunau [Fri, 18 Jul 2014 17:44:57 +0200 (17:44 +0200)]
aarch64: add utility functions for asm

3 hours agoaarch64: add armv8 and neon cpu flags and test them
commit | commitdiff | tree
Janne Grunau [Wed, 19 Mar 2014 14:45:17 +0200 (13:45 +0100)]
aarch64: add armv8 and neon cpu flags and test them

3 hours agoaarch64: initial build support
commit | commitdiff | tree
Janne Grunau [Tue, 18 Mar 2014 23:10:24 +0200 (22:10 +0100)]
aarch64: initial build support

3 hours agocheckasm: test zigzag_sub_8x8_{frame,field}
commit | commitdiff | tree
Janne Grunau [Tue, 22 Jul 2014 19:28:27 +0200 (19:28 +0200)]
checkasm: test zigzag_sub_8x8_{frame,field}

3 hours agoarm: use long multiplication in mc_weight_w*_neon
commit | commitdiff | tree
Janne Grunau [Sun, 20 Jul 2014 18:29:01 +0200 (18:29 +0200)]
arm: use long multiplication in mc_weight_w*_neon

9-19% faster on a cortex-a9.

3 hours agoarm: do not use aligned stores in mc_weight_w4_*neon
commit | commitdiff | tree
Janne Grunau [Sun, 20 Jul 2014 18:24:57 +0200 (18:24 +0200)]
arm: do not use aligned stores in mc_weight_w4_*neon

mc_weight_w4_*neon is also used for width 2 which does not guarantee
4-byte aligned destination. Fixes crashes caused by random memory
corruption.

3 hours agocheckasm: add memory clobber to read_time inline asm
commit | commitdiff | tree
Janne Grunau [Wed, 2 Apr 2014 16:31:28 +0200 (16:31 +0200)]
checkasm: add memory clobber to read_time inline asm

The memory acts as compiler barrier preventing aggressive reordering
of read_time calls. gcc 4.8 reorders some of initial read_time calls
after the second when targeting arm.

3 hours agoarm: check if the assembler supports the '.func' directive
commit | commitdiff | tree
Janne Grunau [Sun, 20 Jul 2014 13:32:10 +0200 (13:32 +0200)]
arm: check if the assembler supports the '.func' directive

The integrated assembler in llvm trunk (to be released as 3.5) is
otherwise capable enough to assemble the arm asm correctly.

3 hours agoarm/ppc: use $CC as default assembler
commit | commitdiff | tree
Janne Grunau [Sun, 20 Jul 2014 13:40:28 +0200 (13:40 +0200)]
arm/ppc: use $CC as default assembler

3 hours agoarm: move instructions after '.rept' to separate line
commit | commitdiff | tree
Janne Grunau [Sun, 20 Jul 2014 13:34:27 +0200 (13:34 +0200)]
arm: move instructions after '.rept' to separate line

The gas manual states "Repeat the sequence of lines between the .rept
directive and the next .endr directive ...". GNU as seems to support
instructions on the same line as .rept anyway but the integrated
assembler in llvm trunk (to be released 3.5 in August 2014) does not.

3 hours agoarm: set .arch/.fpu from asm.S
commit | commitdiff | tree
Janne Grunau [Sun, 20 Jul 2014 13:08:17 +0200 (13:08 +0200)]
arm: set .arch/.fpu from asm.S

3 hours agoarm: do not append CFLAGS to ASFLAGS
commit | commitdiff | tree
Janne Grunau [Sun, 20 Jul 2014 12:55:53 +0200 (12:55 +0200)]
arm: do not append CFLAGS to ASFLAGS

3 hours agofilters: fix sizeof mismatch stable
commit | commitdiff | tree
Tristan Matthews [Thu, 17 Jul 2014 06:03:50 +0200 (00:03 -0400)]
filters: fix sizeof mismatch

3 hours agoFix memory leak when using select_every filter
commit | commitdiff | tree
Anton Mitrofanov [Thu, 31 Jul 2014 14:17:32 +0200 (16:17 +0400)]
Fix memory leak when using select_every filter


2 hours agoFix cltostr.sh on OS X master
commit | commitdiff | tree
Tsukasa OMOTO [Sun, 20 Jul 2014 15:17:11 +0200 (22:17 +0900)]
Fix cltostr.sh on OS X

16 hours agoCheck pf_log is set in validate_parameters
commit | commitdiff | tree
Fiona Glaser [Wed, 9 Jul 2014 21:21:33 +0200 (12:21 -0700)]
Check pf_log is set in validate_parameters

Help remind people to call x264_param_default in case they didn't read the
documentation.

16 hours agoCheck malloc during frame dumping
commit | commitdiff | tree
Anton Mitrofanov [Wed, 9 Jul 2014 15:17:04 +0200 (17:17 +0400)]
Check malloc during frame dumping

16 hours agomp4_lsmash: Use new I/O API instead of deprecated one.
commit | commitdiff | tree
Yusuke Nakamura [Wed, 18 Jun 2014 22:21:29 +0200 (05:21 +0900)]
mp4_lsmash: Use new I/O API instead of deprecated one.

16 hours agoRemove meaningless use of abs()
commit | commitdiff | tree
Anton Mitrofanov [Sun, 8 Jun 2014 20:19:46 +0200 (22:19 +0400)]
Remove meaningless use of abs()

16 hours agoMSVS 2013 Update 2 support
commit | commitdiff | tree
Steven Walters [Sat, 31 May 2014 16:31:16 +0200 (10:31 -0400)]
MSVS 2013 Update 2 support

The first MSVS compiler C99 compliant enough to build x264.
Use `CC=cl ./configure` to compile with it.

16 hours agoconfigure: Add -Wno-maybe-uninitialized to CFLAGS
commit | commitdiff | tree
Diego Biurrun [Tue, 15 Apr 2014 22:54:08 +0200 (22:54 +0200)]
configure: Add -Wno-maybe-uninitialized to CFLAGS

The warnings generated by -Wmaybe-uninitialized are mostly spurious.

16 hours agobuild: Replace cltostr.pl by a shell script
commit | commitdiff | tree
Diego Biurrun [Wed, 7 May 2014 13:20:43 +0200 (13:20 +0200)]
build: Replace cltostr.pl by a shell script

This avoids a dependency on Perl to build OpenCL support.

16 hours agobuild: Simplify phony target declaration with wildcards
commit | commitdiff | tree
Diego Biurrun [Tue, 15 Apr 2014 23:02:39 +0200 (23:02 +0200)]
build: Simplify phony target declaration with wildcards

Also add etags to list of phony targets.

16 hours agoconfigure: Drop workaround for obsolete gcc 4.2 on ARM
commit | commitdiff | tree
Diego Biurrun [Wed, 7 May 2014 12:47:37 +0200 (12:47 +0200)]
configure: Drop workaround for obsolete gcc 4.2 on ARM

16 hours agobuild: Add dependencies on x86inc.asm/x86util.asm for all .asm files
commit | commitdiff | tree
Diego Biurrun [Wed, 7 May 2014 21:43:15 +0200 (21:43 +0200)]
build: Add dependencies on x86inc.asm/x86util.asm for all .asm files

This is a little bit overzealous, but errs on the side of caution.
Generating full dependency information is also possible, but slightly
slows down the build as YASM cannot do it as a sideeffect of compilation.

16 hours agoDelete all SPARC optimizations
commit | commitdiff | tree
Diego Biurrun [Sun, 27 Apr 2014 21:09:54 +0200 (21:09 +0200)]
Delete all SPARC optimizations

SPARC has been obsolete for a long time and makes little sense as a
H.264 encoding platform.

Also update authors file.

16 hours agoconfigure: Don't check for libavcore
commit | commitdiff | tree
Diego Biurrun [Wed, 7 May 2014 12:46:42 +0200 (12:46 +0200)]
configure: Don't check for libavcore

libavcore was a never-released bad idea with a short lifespan.

16 hours agobuild: Set all ASFLAGS from within configure
commit | commitdiff | tree
Diego Biurrun [Sun, 27 Apr 2014 23:19:04 +0200 (23:19 +0200)]
build: Set all ASFLAGS from within configure

This is how all other toolchain flags are handled.

16 hours agoopencl: Check return value of fread()
commit | commitdiff | tree
Diego Biurrun [Sun, 27 Apr 2014 23:23:49 +0200 (23:23 +0200)]
opencl: Check return value of fread()

common/opencl.c:138:10: warning: ignoring return value of 'fread', declared with attribute warn_unused_result [-Wunused-result]

16 hours agoDisable i8x8 in lossless stable
commit | commitdiff | tree
Fiona Glaser [Sun, 20 Jul 2014 05:34:22 +0200 (20:34 -0700)]
Disable i8x8 in lossless

x264's implementation was slightly incorrect due to a vague spec, so some
decoders decoded video incorrectly.

Minimal impact on compression.

16 hours agoAVC-Intra: fix compatibility with Avid Transfermanager
commit | commitdiff | tree
Thomas Mundt [Fri, 27 Jun 2014 20:12:06 +0200 (11:12 -0700)]
AVC-Intra: fix compatibility with Avid Transfermanager

16 hours agox86: Fix SIGILL in high bit-depth intra_sad_x3_4x4_sse2
commit | commitdiff | tree
Henrik Gramner [Tue, 8 Jul 2014 21:15:32 +0200 (21:15 +0200)]
x86: Fix SIGILL in high bit-depth intra_sad_x3_4x4_sse2

An SSE3 instruction was used in an SSE2 function.

16 hours agoFix incorrect row predictor addressing
commit | commitdiff | tree
Anton Mitrofanov [Wed, 9 Jul 2014 15:01:54 +0200 (17:01 +0400)]
Fix incorrect row predictor addressing

Somehow managed to not cause things to explode, but was clearly incorrect.
Might improve VBV in some cases to have this working right.

16 hours agoFix b-pyramid MMCO remove for frame-packing==5
commit | commitdiff | tree
Anton Mitrofanov [Sat, 21 Jun 2014 21:52:39 +0200 (23:52 +0400)]
Fix b-pyramid MMCO remove for frame-packing==5

16 hours agoFix frame-packing==5 with some decoders
commit | commitdiff | tree
Tal Aloni [Wed, 18 Jun 2014 00:10:56 +0200 (15:10 -0700)]
Fix frame-packing==5 with some decoders

The spec mandates that frame-packing==5 requires the SEI on every frame that
begins a view sequence (i.e. the input frames L0-R0-L1-R1 have 4 view sequences,
but if reordered by the encoder to L0-L1-R0-R1 there are now 2 view sequences).
For simplicity, we write the SEI on every frame.

This fixes frame-packing==5 3D playback on some decoders (PlayStation 3, Sony
W8 series, possibly others).

4 weeks agoFix pixel_ssim_end4 asm function for x86_64 systems
commit | commitdiff | tree
Anton Mitrofanov [Thu, 22 May 2014 11:27:00 +0200 (13:27 +0400)]
Fix pixel_ssim_end4 asm function for x86_64 systems

8 hours agox86: XOP pixel_sad_{x3, x4} high bit-depth master
commit | commitdiff | tree
James Almer [Wed, 9 Apr 2014 08:33:06 +0200 (03:33 -0300)]
x86: XOP pixel_sad_{x3, x4} high bit-depth

8 hours agox86: XOP pixel_ssd_nv12_core
commit | commitdiff | tree
James Almer [Wed, 9 Apr 2014 08:33:05 +0200 (03:33 -0300)]
x86: XOP pixel_ssd_nv12_core

8 hours agox86util: XOP optimized HADDD
commit | commitdiff | tree
James Almer [Wed, 9 Apr 2014 08:33:04 +0200 (03:33 -0300)]
x86util: XOP optimized HADDD

8 hours agox86: add missing initialization for high bit-depth sa8d_satd
commit | commitdiff | tree
James Almer [Wed, 9 Apr 2014 08:33:03 +0200 (03:33 -0300)]
x86: add missing initialization for high bit-depth sa8d_satd

8 hours agox86: add missing initializations for high bit-depth variance
commit | commitdiff | tree
James Almer [Sun, 6 Apr 2014 04:46:31 +0200 (23:46 -0300)]
x86: add missing initializations for high bit-depth variance

8 hours agoarm: use the weight_fn_t typedef for mc weight function arrays
commit | commitdiff | tree
Janne Grunau [Tue, 1 Apr 2014 22:11:45 +0200 (22:11 +0200)]
arm: use the weight_fn_t typedef for mc weight function arrays

8 hours agoarm: correct x264_mc_chroma_neon function declaration
commit | commitdiff | tree
Janne Grunau [Tue, 1 Apr 2014 22:11:44 +0200 (22:11 +0200)]
arm: correct x264_mc_chroma_neon function declaration

8 hours agoarm: do not export every asm function
commit | commitdiff | tree
Janne Grunau [Tue, 1 Apr 2014 22:11:43 +0200 (22:11 +0200)]
arm: do not export every asm function

Based on Libav's libavutil/arm/asm.S. Also prevents having the same
label twice for every function on systems not defining EXTERN_ASM.
Clang's integrated assembler does not like it.

8 hours agoarm: move all .macro/.endm to column 0
commit | commitdiff | tree
Janne Grunau [Tue, 1 Apr 2014 22:11:42 +0200 (22:11 +0200)]
arm: move all .macro/.endm to column 0

8 hours agoaarch64: require PIC in shared mode
commit | commitdiff | tree
William Grant [Sun, 23 Mar 2014 18:21:52 +0200 (09:21 -0700)]
aarch64: require PIC in shared mode

8 hours agoarm: x264_coeff_last8_arm
commit | commitdiff | tree
Janne Grunau [Sun, 16 Mar 2014 18:21:58 +0200 (17:21 +0100)]
arm: x264_coeff_last8_arm

checkasm --bench on a coretex-a9:
coeff_last8_c: 173
coeff_last8_armv6: 151

60 instead of 73 cycles in ~130k runs on the same cpu while encoding.

8 hours agoarm: x264_store_interleave_chroma_neon
commit | commitdiff | tree
Janne Grunau [Sat, 15 Mar 2014 21:09:18 +0200 (20:09 +0100)]
arm: x264_store_interleave_chroma_neon

store_interleave_chroma_c: 4036
store_interleave_chroma_neon: 1043

8 hours agoarm: x264_plane_copy_interleave_neon
commit | commitdiff | tree
Janne Grunau [Sat, 15 Mar 2014 20:55:50 +0200 (19:55 +0100)]
arm: x264_plane_copy_interleave_neon

plane_copy_interleave_c: 40285
plane_copy_interleave_neon: 10137

8 hours agoarm: x264_plane_copy_deinterleave_rgb_neon
commit | commitdiff | tree
Janne Grunau [Sat, 15 Mar 2014 20:21:12 +0200 (19:21 +0100)]
arm: x264_plane_copy_deinterleave_rgb_neon

plane_copy_deinterleave_rgb_c: 31543
plane_copy_deinterleave_rgb_neon: 8312

8 hours agoarm: load_deinterleave_chroma_f{dec,enc}_neon
commit | commitdiff | tree
Janne Grunau [Sat, 15 Mar 2014 19:22:49 +0200 (18:22 +0100)]
arm: load_deinterleave_chroma_f{dec,enc}_neon

load_deinterleave_chroma_fdec_c: 4055
load_deinterleave_chroma_fdec_neon: 995
load_deinterleave_chroma_fenc_c: 4071
load_deinterleave_chroma_fenc_neon: 992

8 hours agoarm: x264_plane_copy_deinterleave_neon
commit | commitdiff | tree
Janne Grunau [Sat, 15 Mar 2014 18:22:08 +0200 (17:22 +0100)]
arm: x264_plane_copy_deinterleave_neon

plane_copy_deinterleave_c: 42988
plane_copy_deinterleave_neon: 10184

8 hours agoarm: implement deblock_strength_neon
commit | commitdiff | tree
Janne Grunau [Sat, 15 Mar 2014 14:29:41 +0200 (13:29 +0100)]
arm: implement deblock_strength_neon

Based on deblock_strength_avx.

checkasm --bench on a cortex-a9:
deblock_strength_c: 14611
deblock_strength_neon: 1848

8 hours agoarm: add missing macro instantiation for x264_pixel_avg_4x16_neon
commit | commitdiff | tree
Janne Grunau [Sat, 15 Mar 2014 11:51:11 +0200 (10:51 +0100)]
arm: add missing macro instantiation for x264_pixel_avg_4x16_neon

checkasm --bench on a cortex-a9:
avg_4x16_c: 8910
avg_4x16_neon: 2091

8 hours agoarm: implement x264_predict_4x4_v_armv6
commit | commitdiff | tree
Janne Grunau [Thu, 13 Mar 2014 02:02:13 +0200 (01:02 +0100)]
arm: implement x264_predict_4x4_v_armv6

Alone probably not worth it but allows use of predict_4x4_dc|h_armv6
in intra_sad|satd_x3_4x4_neon.

8 hours agoppc: fix build on certain PowerPC variants without Altivec stable
commit | commitdiff | tree
Roland Stigge [Sun, 23 Mar 2014 18:29:37 +0200 (09:29 -0700)]
ppc: fix build on certain PowerPC variants without Altivec

8 hours agoOnly add strip option '-s' for linker flags
commit | commitdiff | tree
Anton Mitrofanov [Mon, 21 Apr 2014 22:58:24 +0200 (00:58 +0400)]
Only add strip option '-s' for linker flags

Fixes some build warnings with clang.

34 hours agoconfigure: remove an unnecessary option from CFLAGS on OS X
commit | commitdiff | tree
Tsukasa OMOTO [Sat, 15 Mar 2014 09:53:53 +0200 (16:53 +0900)]
configure: remove an unnecessary option from CFLAGS on OS X

Fixes Clang 3.4 compilation on OS X.

4 hours agoMacroblock tree overhaul/optimization master
Jason Garrett-Glaser [Sun, 23 Feb 2014 19:36:55 +0100 (10:36 -0800)]
Macroblock tree overhaul/optimization

Move the second core part of macroblock tree into an assembly function;
SIMD-optimize roughly half of it (for x86). Roughly ~25-65% faster mbtree,
depending on content.

Slightly change how mbtree handles the tradeoff between range and precision
for propagation.

Overall a slight (but mostly negligible) effect on SSIM and ~2% faster.

4 hours agoarm: use available neon functions for intra_sa8d/sad/satd_x3
Janne Grunau [Thu, 13 Mar 2014 00:05:48 +0100 (00:05 +0100)]
arm: use available neon functions for intra_sa8d/sad/satd_x3

4% faster on main/medium, 15% faster on baseline/superfast on a cortex-a9.

26 hours agoarm: implement x264_pixel_var2_8x16_neon
Janne Grunau [Wed, 12 Mar 2014 14:35:31 +0100 (14:35 +0100)]
arm: implement x264_pixel_var2_8x16_neon

checkasm --bench on a cortex-a9:
var2_8x16_c: 5677
var2_8x16_neon: 1421

26 hours agoarm: implement x264_pixel_var_8x16_neon
Janne Grunau [Wed, 12 Mar 2014 13:16:00 +0100 (13:16 +0100)]
arm: implement x264_pixel_var_8x16_neon

checkasm --bench on a cortex-a9:
var_8x16_c: 4306
var_8x16_neon: 791

42 hours agox86: SSE2 and SSSE3 plane_copy_deinterleave_rgb
Henrik Gramner [Sun, 23 Feb 2014 15:33:48 +0100 (15:33 +0100)]
x86: SSE2 and SSSE3 plane_copy_deinterleave_rgb

About 5.6x faster than C on Haswell.

42 hours agox86: Minor mbtree_propagate_cost improvements
Henrik Gramner [Sun, 16 Feb 2014 21:24:54 +0100 (21:24 +0100)]
x86: Minor mbtree_propagate_cost improvements

Reduce the number of registers used from 7 to 6.
Reduce the number of vector registers used by the AVX2 implementation from 8 to 7.
Multiply fps_factor by 1/256 once per frame instead of once per macroblock row.
Use mova instead of movu for dst since it's guaranteed to be aligned.
Some cosmetics.

42 hours agox86inc: Support arbitrary stack alignments
Henrik Gramner [Sun, 9 Feb 2014 23:58:04 +0100 (23:58 +0100)]
x86inc: Support arbitrary stack alignments

If the stack is known to be at least 32-byte aligned we can safely store ymm
registers on the stack without doing manual alignment.

Change ALLOC_STACK to always align the stack before allocating stack space for
consistency. Previously alignment would occur either before or after allocating
stack space depending on whether manual alignment was required or not.

42 hours agox86inc: warn if XOP integer FMA instruction emulation is impossible
Anton Mitrofanov [Fri, 14 Feb 2014 12:53:58 +0100 (15:53 +0400)]
x86inc: warn if XOP integer FMA instruction emulation is impossible

Emulation requires a temporary register if arguments 1 and 4 are the same; this
doesn't obey the semantics of the original instruction, so we can't emulate
that in x86inc.

ffmpeg has an x86util emulation for that case; I'll add it if x264's asm ever
needs it.

Also add pmacsdql emulation.

42 hours agox86inc: free up variable name "n" in global namespace
Loren Merritt [Sat, 1 Mar 2014 03:57:56 +0100 (02:57 +0000)]
x86inc: free up variable name "n" in global namespace

42 hours agox86: Pass -Worphan-labels to yasm
Henrik Gramner [Wed, 22 Jan 2014 19:09:12 +0100 (19:09 +0100)]
x86: Pass -Worphan-labels to yasm

Makes it easier to detect typos.

42 hours agoWrite 3D metadata when outputting Matroska
Steve Lhomme [Sun, 16 Feb 2014 13:15:09 +0100 (13:15 +0100)]
Write 3D metadata when outputting Matroska

For when --frame-packing is set.

42 hours agoDon't set chroma_loc_info_present_flag for non-4:2:0
Anton Mitrofanov [Sun, 23 Feb 2014 13:56:03 +0100 (16:56 +0400)]
Don't set chroma_loc_info_present_flag for non-4:2:0

The H.264 spec says it shouldn't be set in these cases.

42 hours agox264.h: fix documentation stable
Jason Garrett-Glaser [Mon, 10 Mar 2014 16:42:50 +0100 (08:42 -0700)]
x264.h: fix documentation

The full details of the return values of encoder_encode and encoder_headers
were mistakenly removed a while ago; re-add them.

42 hours agoFix pointer cast warning for 64-bit builds
Anton Mitrofanov [Sun, 23 Feb 2014 12:52:57 +0100 (15:52 +0400)]
Fix pointer cast warning for 64-bit builds

2 days agombaff: fix mb_field_decoding_flag tracking and simplify allow skip check
Anton Mitrofanov [Mon, 10 Mar 2014 13:48:02 +0100 (16:48 +0400)]
mbaff: fix mb_field_decoding_flag tracking and simplify allow skip check

Fixes an issue with too many forced non-skips in mbaff+cavlc, as well as
non-deterministic output with mbaff+cavlc+sliced-threads.

2 days agoFix memory overwrite in x264_deblock_h_chroma_mbaff_sse2
Anton Mitrofanov [Mon, 10 Mar 2014 00:22:57 +0100 (03:22 +0400)]
Fix memory overwrite in x264_deblock_h_chroma_mbaff_sse2

Fixes possible corruption with MBAFF+sliced threads.

2 days agoFix corruption with CAVLC overflow handling in MBAFF+main profile
Jason Garrett-Glaser [Sun, 2 Mar 2014 19:09:01 +0100 (10:09 -0800)]
Fix corruption with CAVLC overflow handling in MBAFF+main profile

Probably a regression in 83561e.

2 days agoFix checkasm --bench output when nop_cycles is too large
Anton Mitrofanov [Mon, 10 Mar 2014 18:17:19 +0100 (21:17 +0400)]
Fix checkasm --bench output when nop_cycles is too large

2 weeks agoReally fix quantization factor allocation
Anton Mitrofanov [Wed, 22 Jan 2014 09:54:49 +0100 (12:54 +0400)]
Really fix quantization factor allocation

Actually allocate less (instead of just initialize less) and fix comments.

2 weeks agoFix build with Android NDK
Yu Xiaolei [Sun, 23 Feb 2014 13:12:51 +0100 (04:12 -0800)]
Fix build with Android NDK

Android NDK does not expose sched_getaffinity.


10 hours agox86inc: speed up compilation with yasm master
Loren Merritt [Thu, 16 Jan 2014 22:34:46 +0100 (13:34 -0800)]
x86inc: speed up compilation with yasm

Work around yasm's inefficiency with handling large numbers of variables
in the global scope.

10 hours agoAdd support for AVC-Intra Class 200
Kieran Kunhya [Sat, 11 Jan 2014 00:27:33 +0100 (23:27 +0000)]
Add support for AVC-Intra Class 200

10 hours agov210 input support
James Weaver [Tue, 7 Jan 2014 11:31:58 +0100 (10:31 +0000)]
v210 input support

Assembly based on code by Henrik Gramner and Loren Merritt.

10 hours agoFix quantization factor allocation
Jason Garrett-Glaser [Tue, 21 Jan 2014 22:39:33 +0100 (13:39 -0800)]
Fix quantization factor allocation

We don't need to wastefully allocate quant tables above QP_MAX_SPEC; they're
never used.

13 days agoAvoid some unneccesary memory loads in macroblock_encode
Henrik Gramner [Wed, 8 Jan 2014 01:06:56 +0100 (01:06 +0100)]
Avoid some unneccesary memory loads in macroblock_encode

13 days agoBump dates to 2014
Henrik Gramner [Sun, 5 Jan 2014 15:25:05 +0100 (15:25 +0100)]
Bump dates to 2014

Also update AUTHORS file and my e-mail address in the headers of various files.

13 days agoRemove tools/xyuv.c
Henrik Gramner [Mon, 6 Jan 2014 00:18:31 +0100 (00:18 +0100)]
Remove tools/xyuv.c

It's an old stand-alone application that isn't relevant to x264.

13 days agoUse 8x16c wrappers with x86 asm functions for 4:2:2 with high bit depth
Anton Mitrofanov [Wed, 6 Nov 2013 23:37:23 +0100 (02:37 +0400)]
Use 8x16c wrappers with x86 asm functions for 4:2:2 with high bit depth

13 days agoCLI: Avoid redundant 16-bit upconversions in piped raw input
Henrik Gramner [Fri, 20 Dec 2013 22:44:28 +0100 (22:44 +0100)]
CLI: Avoid redundant 16-bit upconversions in piped raw input

It's not possible to seek in pipes, so if we want to skip frames we have to read and
discard unused ones. It's pointless to do bit-depth upconversions in those frames.

13 days agoFix input support from named pipes in Windows stable
Anton Mitrofanov [Fri, 3 Jan 2014 17:06:06 +0100 (20:06 +0400)]
Fix input support from named pipes in Windows

13 days agoFix ARM asm compilation with Apple assembler
Steve Clark [Wed, 20 Nov 2013 18:40:23 +0100 (21:40 +0400)]
Fix ARM asm compilation with Apple assembler

2 weeks agoFix uninitialized variable
Anton Mitrofanov [Wed, 13 Nov 2013 16:24:48 +0100 (19:24 +0400)]
Fix uninitialized variable

Caused if the timebase is not specified in stats file. Found by Clang.

14 hours agoRemove --visualize option. master
Anton Mitrofanov [Sun, 27 Oct 2013 16:27:23 +0100 (19:27 +0400)]
Remove --visualize option.

It probably wasn't used or maintained for last few years.

14 hours agoAdd L-SMASH support as preferable alternative for MP4-muxing
Anton Mitrofanov [Tue, 15 Oct 2013 09:32:25 +0100 (12:32 +0400)]
Add L-SMASH support as preferable alternative for MP4-muxing

14 hours agoAdd AVC-Intra 1080p50/60 Class 100 parameters
Kieran Kunhya [Sat, 21 Sep 2013 19:16:12 +0100 (19:16 +0100)]
Add AVC-Intra 1080p50/60 Class 100 parameters

Also add some compatibility fixes.

14 hours agoAdd --filler option
Jason Garrett-Glaser [Mon, 9 Sep 2013 20:37:59 +0100 (12:37 -0700)]
Add --filler option

Allows generation of hard-CBR streams without using NAL HRD.
Useful if you want to be able to reconfigure the bitrate (which you can't do
with NAL HRD on).

14 hours agoMake x264_encoder_reconfig more threadsafe
Anton Mitrofanov [Sun, 27 Oct 2013 12:22:51 +0100 (15:22 +0400)]
Make x264_encoder_reconfig more threadsafe

Do the reconfig when the next frame's encode begins.
Fixes some rare crashes with frame-threading and encoder_reconfig.

5 days agochroma-me: take shortcut in BI analysis
Jason Garrett-Glaser [Fri, 25 Oct 2013 01:19:00 +0100 (17:19 -0700)]
chroma-me: take shortcut in BI analysis

~100 cycles faster with subme>=9

5 days agoCRF-max: don't warn if VBV underflow occurs
Jason Garrett-Glaser [Thu, 24 Oct 2013 22:44:43 +0100 (14:44 -0700)]
CRF-max: don't warn if VBV underflow occurs

Only warn if underflow occurs for reasons other than CRF-max, as CRF-max
implies that VBV underflow is desired by the user.

5 days agox86inc: Make ym# behave the same way as xm#
Henrik Gramner [Fri, 18 Oct 2013 21:43:36 +0100 (22:43 +0200)]
x86inc: Make ym# behave the same way as xm#

This makes more sense for future implementations of templates with zmm registers.

5 days agoUse calloc instead of malloc + memset
Henrik Gramner [Fri, 18 Oct 2013 21:21:38 +0100 (22:21 +0200)]
Use calloc instead of malloc + memset

5 days agoReplace gf_malloc with regular malloc in mp4 muxer
Henrik Gramner [Thu, 10 Oct 2013 15:54:12 +0100 (16:54 +0200)]
Replace gf_malloc with regular malloc in mp4 muxer

It was used as a workaround for a bug that only existed in the GPAC repository
for a few weeks back in 2010. There's no reason to keep it anymore.

5 days agoUpdate to current libav/ffmpeg API
Anton Mitrofanov [Tue, 8 Oct 2013 20:20:40 +0100 (23:20 +0400)]
Update to current libav/ffmpeg API

5 days agoversion.sh: change to use /bin/sh
Rafaël Carré [Fri, 25 Oct 2013 15:12:24 +0100 (07:12 -0700)]
version.sh: change to use /bin/sh

6 days agoconfigure: don't generate a git version number if .git isn't present
Sean McGovern [Wed, 4 Sep 2013 22:15:00 +0100 (14:15 -0700)]
configure: don't generate a git version number if .git isn't present

6 days agoconfigure: include dependency libs in the Libs pkg-config
Martin Storsjo [Tue, 3 Sep 2013 22:56:18 +0100 (14:56 -0700)]
configure: include dependency libs in the Libs pkg-config

If only a static library is built, the user of the library that just
tries to link to the lib using the flags provided by pkg-config
might not know that only a static lib exists and that he'd have to
pass --static to pkg-config to get the internal dependencies to
be able to link the library.

For a shared build, the internal dependencies are kept in Libs.private
as before.

This matches how libav's pkg-config files are generated.

6 days agoFix compilation in case of HAVE_LOG2F check fails spuriously stable
Anton Mitrofanov [Thu, 17 Oct 2013 21:38:06 +0100 (00:38 +0400)]
Fix compilation in case of HAVE_LOG2F check fails spuriously

6 days agoFix compilation of shared library for Windows with original MinGW toolchain
Anton Mitrofanov [Sat, 12 Oct 2013 09:01:57 +0100 (12:01 +0400)]
Fix compilation of shared library for Windows with original MinGW toolchain

6 days agoFix possible crashes in resize and crop filters with high bitdepth input
Anton Mitrofanov [Tue, 8 Oct 2013 20:32:37 +0100 (23:32 +0400)]
Fix possible crashes in resize and crop filters with high bitdepth input

8 weeks agoFix INSTALL in configure for Solaris systems
Tim Mooney [Tue, 3 Sep 2013 21:43:50 +0100 (13:43 -0700)]
Fix INSTALL in configure for Solaris systems

2 months agoWorkaround for FFMS indexing bug
Henrik Gramner [Tue, 27 Aug 2013 23:50:31 +0100 (00:50 +0200)]
Workaround for FFMS indexing bug

If FFMS_ReadIndex is used with an empty index file it gets stuck in an infinite loop instead of returning NULL
like it's supposed to do on failure. Explicitly check if the file is empty before calling it as a workaround.


Anton Mitrofanov [Mon, 26 Aug 2013 19:20:31 +0200 (21:20 +0400)]
Fix masked access violation in KERNEL32

Caused crashes under gdb in Windows and might cause other unknown problems.

Hiroki Taniura [Sat, 24 Aug 2013 18:18:57 +0200 (01:18 +0900)]
Fix GPAC support on Windows

Henrik Gramner [Sun, 11 Aug 2013 19:50:42 +0200 (19:50 +0200)]
Windows Unicode support

Windows, unlike most other operating systems, uses UTF-16 for Unicode strings while x264 is designed for UTF-8.

This patch does the following in order to handle things like Unicode filenames:
* Keep strings internally as UTF-8.
* Retrieve the CLI command line as UTF-16 and convert it to UTF-8.
* Always use Unicode versions of Windows API functions and convert strings to UTF-16 when calling them.
* Attempt to use legacy 8.3 short filenames for external libraries without Unicode support.

Kieran Kunhya [Sat, 20 Jul 2013 19:47:59 +0200 (18:47 +0100)]
AVC-Intra support

This format has been reverse engineered and x264's output has almost exactly
the same bitstream as Panasonic cameras and encoders produce. It therefore does
not comply with SMPTE RP2027 since Panasonic themselves do not comply with
their own specification. It has been tested in Avid, Premiere, Edius and
Quantel.

Parts of this patch were written by Jason Garrett-Glaser and some reverse
engineering was done by Joseph Artsimovich.

Henrik Gramner [Mon, 8 Jul 2013 21:06:42 +0200 (12:06 -0700)]
Transparent hugepage support

Combine frame and mb data mallocs into a single large malloc.
Additionally, on Linux systems with hugepage support, ask for hugepages on
large mallocs.

This gives a small performance improvement (~0.2-0.9%) on systems without
hugepage support, as well as a small memory footprint reduction.

On recent Linux kernels with hugepage support enabled (set to madvise or
always), it improves performance up to 4% at the cost of about 7-12% more
memory usage on typical settings..

It may help even more on Haswell and other recent CPUs with improved 2MB page
support in hardware.

rev2348
x86: SSSE3 implementation of pixel_sad_x3 and pixel_sad_x4

rev2347
x86: Faster AVX2 pixel_sad_x3 and pixel_sad_x4

rev2346
x86: Remove X264_CPU_SSE_MISALIGN functions

Prevents a crash if the misaligned exception mask bit is cleared for some reason.

Misaligned SSE functions are only used on AMD Phenom CPUs and the benefit is miniscule.
They also require modifying the MXCSR control register and by removing those functions
we can get rid of that complexity altogether.

VEX-encoded instructions also supports unaligned memory operands. I tried adding AVX
implementations of all removed functions but there were no performance improvements on
Ivy Bridge. pixel_sad_x3 and pixel_sad_x4 had significant code size reductions though
so I kept them and added some minor cosmetics fixes and tweaks.

rev2345
Tweak i16x16-delta-quant-avoidance code

Don't omit the delta quant if it'd raise the quantizer to do so; this fixes
a rare flickering issue caused by deblocking.

rev2344
x86: faster AVX2 iDCT, AVX deblock_luma_h, deblock_luma_h_intra

rev2343
Add new color primaries, transfer characteristics, matrix coefficients

rev2342
Add "--stitchable" option for segmented encoding

Stops x264 from attempting to optimize global stream headers, ensuring that
different segments of a video will have identical headers when used with
identical encoding settings.

rev2341
Interface: if vbv-maxrate < bitrate, set bitrate = vbv-maxrate

This probably makes more sense to the user than setting vbv-maxrate = bitrate,
as before.

rev2340
OpenCL cosmetics

rev2339
Fix possible crash when writing very large filler NALUs

Bitstream-reallocation function didn't handle the case of filler.

rev2338
Fix build with PIC on some systems

rev2337
Fix potential misaligment crash in AVX2 denoise_dct

rev2336
Fix building with compilers without inline asm support

Also fix crash in high bit depth builds compiled with unaligned stack.

rev2335
Fix compilation with OpenCL on MacOS X

Also fix crash in the case of OpenCL error during encoding.

r2334
OpenCL support improvement/refactoring

Autoload the OpenCL library so that it's not required to run an openCL-enabled
build of x264.

Update X264_BUILD, which should have been changed with the first patch.

r2333
x86: shave a few instructions off AVX deblock

r2332
x86: AVX2 dequant_4x4_dc

r2331
x86: AVX2 high bit-depth dequant

r2330
x86-64: 64-bit variant of AVX2 hpel_filter

~5% faster than 32-bit.

r2329
x86: AVX2 high bit-depth denoise_dct

28->15 cycles

Also reorder instructions to use fewer registers, 3 cycles faster on Ivy Bridge with 64-bit Windows.

r2328
x86: AVX2 high bit-depth quant

quant_4x4: 13->6 cycles
quant_4x4_dc: 14->8 cycles
quant_8x8: 47->24 cycles
quant_4x4x4: 48->25 cycles

r2327
x86: AVX2 add16x16_idct_dc

27 -> 19 cycles

r2326
x86: faster AVX2 quant_4x4x4

10->9 cycles

r2325
x86: AVX2 intra_sad_x3_8x8c

30->22 cycles

r2324
x86: AVX2 high bit-depth intra_sad_x3_8x8

43->24 cycles

r2323
x86: AVX2 deblock strength

30->18 cycles

r2322
x86: Faster high bit-depth intra_sad_x3_4x4

20->16 cycles on Ivy Bridge

r2321
x86: faster SSSE3 hpel

~7% faster using the pmulhrsw trick from mc_chroma.

r2320
x86-64: faster SSSE3 trellis

~2% faster trellis.

r2319
x86: 32-byte align the stack if possible

Avoids the need for manual 32 byte array alignment on compilers that support
-mpreferred-stack-boundary.

r2318
x86inc: Utilize the shadow space on 64-bit Windows

Store XMM6 and XMM7 in the shadow space in functions that clobbers them.
This way we don't have to adjust the stack pointer as often,
reducing the number of instructions as well as code size.

r2317
x86: Don't use explicitly aligned versions of SAD on AVX CPUs

On modern CPUs movdqu isn't slower than movdqa when used on aligned data and using the same code in both cases saves cache.

This was already done for the high bit-depth AVX2 implementation but the aligned version still exists as dead code so remove that.

r2316
x86: Add missing initializations for high bit-depth sad_aligned

r2315
x86: add Jaguar CPU detection

r2314
x86inc: Remove .rodata kludges

The Mach-O bug was fixed in yasm 0.8.0 and we don't support versions that old.

a.out was superseded by ELF on sane systems a few decades ago.

r2313
checkasm: Use 64-bit cycle counters

Prevents overflows that can occur in some cases.

r2312
checkasm: Fix stack alignment bug

r2311
Fix invalid memcpy in sliced-threads

Likely didn't actually break in practice, but memcpy with src==dst
is incorrect.

r2310
Fix two bugs in slice-min-mbs and slices-max

Slices-max broke slice-max-size when slice-max wasn't used.
Slice-min-mbs broke in rare cases near the end of a threadslice.

r2309
x86: SSSE3 LUT-based faster coeff_level_run

~2x faster coeff_level_run.
Faster CAVLC encoding: {1%,2%,7%} overall with {superfast,medium,slower}.
Uses the same pshufb LUT abuse trick as in the previous ads_mvs patch.

r2308
x86-64: BMI2 cabac_residual functions

r2307
x86: SSSE3 ads_mvs

~55% faster ads in benchasm, ~15-30% in real encoding.
~4% faster "placebo" preset overall.

r2306
x86: AVX2 pixel_ssd_nv12_core

r2305
x86: AVX2 high bit-depth pixel_ssd

r2304
x86: AVX2 high bit-depth pixel_sad_x3/pixel_sad_x4

Also reduce the number of xmm registers used by sse2/ssse3 pixel_sad_x3.

r2303
x86: AVX2 high bit-depth vsad

r2302
x86: AVX2 high bit-depth pixel_sad

Also use loops instead of duplicating code; reduces code size by ~10kB with
negligible effect on performance.

r2301
x86: AVX2 high_bit_depth pixel_avg2, get_ref, mc_copy_w16, mc_luma

Also reduce the number of xmm registers used by mc_copy_* to avoid
saving and restoring xmm6 and xmm7 on 64-bit Windows.

r2300
x86: AVX2 nal_escape

Also rewrite the entire function to be faster and drop the AVX version which is no longer useful.

r2299
x86: AVX memzero_aligned

r2298
x86: AVX2 predict_16x16_dc

r2297
x86: AVX2 predict_8x8c_p/predict_8x16c_p

r2296
x86: AVX2 predict_16x16_p

Also fix the AVX implementation to correctly use the SSSE3 inline asm
instead of SSE2.

r2295
x86: AVX high bit-depth predict_16x16_v

Also restructure some code to reduce code size of various functions,
especially in high bit-depth.

r2294
x86: AVX2 high bit-depth predict_4x4_h

r2293
x86: AVX2 high bit-depth predict_16x16_h

r2292
x86: AVX2 high bit-depth predict_8x8c_h/predict_8x16c_h

r2291
x86util: Support ymm registers in HADD macros

r2290
x86: more AVX2 framework, AVX2 functions, plus some existing asm tweaks

AVX2 functions:
mc_chroma
intra_sad_x3_16x16
last64
ads
hpel
dct4
idct4
sub16x16_dct8
quant_4x4x4
quant_4x4
quant_4x4_dc
quant_8x8
SAD_X3/X4
SATD
var
var2
SSD
zigzag interleave
weightp
weightb
intra_sad_8x8_x9
decimate
integral
hadamard_ac
sa8d_satd
sa8d
lowres_init
denoise

r2289
x86inc: create xm# and ym#, analagous to m#

For when we want to mix simd sizes within one function.

r2288
x86inc: fix AVX emulation of cmp(p|s)(s|d)

r2287
x86-64: cabac_block_residual assembly

RDO: ~20% faster than C
Bitstream: ~50% faster than C
1-2% faster overall, highest on preset superfast/fast/medium.

r2286
OpenCL lookahead

OpenCL support is compiled in by default, but must be enabled at runtime by an
--opencl command line flag. Compiling OpenCL support requires perl. To avoid
the perl requirement use: configure --disable-opencl.

When enabled, the lookahead thread is mostly off-loaded to an OpenCL capable GPU
device. Lowres intra cost prediction, lowres motion search (including subpel)
and bidir cost predictions are all done on the GPU. MB-tree and final slice
decisions are still done by the CPU. Presets which do not use a threaded
lookahead will not use OpenCL at all (superfast, ultrafast).

Because of data dependencies, the GPU must use an iterative motion search which
performs more total work than the CPU would do, so this is not work efficient
or power efficient. But if there are spare GPU cycles to spare, it can often
speed up the encode. Output quality when OpenCL lookahead is enabled is often
very slightly worse in quality than the CPU quality (because of the same data
dependencies).

x264 must compile its OpenCL kernels for your device before running them, and in
order to avoid doing this every run it caches the compiled kernel binary in a
file named x264_lookahead.clbin (--opencl-clbin FNAME to override). The cache
file will be ignored if the device, driver, or OpenCL source are changed.

x264 will use the first GPU device which supports the required cl_image
features required by its kernels. Most modern discrete GPUs and all AMD
integrated GPUs will work. Intel integrated GPUs (up to IvyBridge) do not
support those necessary features. Use --opencl-device N to specify a number of
capable GPUs to skip during device detection.

Switchable graphics environments (e.g. AMD Enduro) are currently not supported,
as some have bugs in their OpenCL drivers that cause output to be silently
incorrect.

Developed by MulticoreWare with support from AMD and Telestream.

r2285
weightp: improve scale/offset search, chroma

Rescale the scale factor if the offset clips. This makes weightp more effective
in fades to/from white (and an other situation that requires big offsets).

Search more than 1 scale factor and more than 1 offset, depending on --subme.

Try to find the optimal chroma denominator instead of hardcoding it.

Overall improvement: a few percent in fade-heavy clips, such as a sample from
Avatar: TLA.

r2284
Add slices-max feature

The H.264 spec technically has limits on the number of slices per frame. x264
normally ignores this, since most use-cases that require large numbers of
slices prefer it to. However, certain decoders may break with extremely large
numbers of slices, as can occur with some slice-max-size/mbs settings.

When set, x264 will refuse to create any slices beyond the maximum number,
even if slice-max-size/mbs requires otherwise.

r2283
Add slice-min-mbs feature

Works in conjunction with slice-max-mbs and/or slice-max-size to avoid overly
small slices.
Useful with certain decoders that barf on extremely small slices.

If slice-min-mbs would be violated as a result of slice-max-size, x264 will
exceed slice-max-size and print a warning.

r2282
Disable mbtree asm with cpu-independent option

Results vary between versions because of different rounding results.

r2281
Show "avs: no" --disable-avs option instead of empty string

r2280
lavf input: don't use deprecated AVStream fields

Fixes building against newer libavcodecs from the Libav project.

r2279
Fix y4m input with C420paldv colorspace

r2278
x86: correctly check stack alignment for Atom hadamard_ac

Regression in r2265 (only affected compilers with broken stack alignment,
like ICL on win32).

r2277
x86inc: fix some corner cases of SWAP

SWAP with >=3 named (rather than numbered) args
PERMUTE followed by SWAP with 2 named args
used to produce the wrong permutation

r2276
Fix array overreads that caused miscompilation in gcc 4.8

r2275
Fix undefined behavior in x264_ratecontrol_mb

r2274
ARM: Fix bug in x264_quant_4x4x4_neon

Regression in r2273.

r2273
ARM: update NEON mc_chroma to work with NV12 and re-enable it

Up to 10-15% faster overall.

r2272
CABAC/CAVLC: use the new bit-iterating macro here too

r2271
quant_4x4x4: quant one 8x8 block at a time

This reduces overhead and lets us use less branchy code for zigzag, dequant,
decimate, and so on.
Reorganize and optimize a lot of macroblock_encode using this new function.
~1-2% faster overall.

Includes NEON and x86 versions of the new function.
Using larger merged functions like this will also make wider SIMD, like
AVX2, more effective.

r2270
Add AvxSynth support to the AviSynth input module.

Uses dlopen to load AvxSynth on Linux and OS X.

Allows the use of --demuxer avs for AvxSynth, though the only source filter it
can currently use is FFMS2.

Add a local copy of avxsynth_c.h and its dependent headers in extras/ so that
users don't need to actually have AvxSynth development headers installed to
enable support for it (mirroring the AviSynth behavior).

Based on a patch by 0x09 (tab@lavabit.com)

r2269
Eliminate some branchiness in ME/analysis

Faster, fewer branch mispredictions.

r2268
Fix some store forwarding stalls
There's quite a few others, but most of them don't help to fix or there's no
easy way to avoid them.

r2267
x86: faster AVX satd/sa8d/sa8d_satd/hadamard_ac

Use Conroe-style movddup in AVX transforms; both Sandy Bridge and Bulldozer
do movddup in the load unit, so it's totally free this way.

On Sandy Bridge:
~6% faster sa8d_satd
~5% faster hadamard_ac
~9% faster 32-bit satd
~2% faster sa8d

r2266
x86: detect Bobcat, improve Atom optimizations, reorganize flags

The Bobcat has a 64-bit SIMD unit reminiscent of the Athlon 64; detect this
and apply the appropriate flags.

It also has an extremely slow palignr instruction; create a flag for this to
avoid massive penalties on palignr-heavy functions.

Improve Atom function selection and document exactly what the SLOW_ATOM flag
covers.

Add Atom-optimized SATD/SA8D/hadamard_ac functions: simply combine the ssse3
optimizations with the sse2 algorithm to avoid pmaddubsw, which is slow on
Atom along with other SIMD multiplies.

Drop TBM detection; it'll probably never be useful for x264.

Invert FastShuffle to SlowShuffle; it only ever applied to one CPU (Conroe).

Detect CMOV, to fail more gracefully when run on a chip with MMX2 but no CMOV.

r2265
x86: combined SA8D/SATD dsp function

Speedup is most apparent for 8-bit (~30%), but gives some improvements
for 10-bit too (~12%).
64-bit only for now.

r2264
x86: port SSE2+ SATD functions to high bit depth

Makes SATD 20-50% faster across all partition sizes but 4x4.

r2263
x86: faster high bit depth ssd

About 15% faster on average.

r2262
x86: optimize and clean up predictor checking
Branchlessly handle elimination of candidates in MMX roundclip asm.
Add a new asm function, similar to roundclip, except without the round part.
Optimize and organize the C code, and make both subme>=3 and subme<3 consistent.
Add lots of explanatory comments and try to make things a little more understandable.
~5-10% faster with subme>=3, ~15-20% faster with subme<3.

r2261
Fix two bugs in predictor checking
pmv wasn't checked properly in some cases, as well as zero vector.
Output-changing portion of the following patch.

r2260
Improve lookahead-threads auto selection
Smarter decision to improve fast-first-pass performance in 2-pass encodes.
Dramatically improves CPU utilization on multi-core systems.

Tested on a quad-core Ivy Bridge (12 threads, 1080p):
Fast first pass:
veryfast: ~7% faster
faster: ~11% faster
fast/medium: ~15% faster
slow/slower: ~42% faster
veryslow: ~55% faster
CRF/1-pass:
veryfast: ~9% faster
(all others remained the same)

r2259
x86: Use SSE instead of SSE2 for copying data

Reduces code size because movaps/movups is one byte shorter than movdqa/movdqu.
Also merge MMX and SSE versions of memcpy_aligned into a single macro.

r2258
64-bit cabac optimizations

~4% faster PIC

WIN64:
~3% faster and 16 byte shorter cabac_encode_bypass
~8% faster cabac_encode_terminal
Benchmarked on Ivy Bridge

UNIX64:
One instruction less in cabac_encode_bypass

r2257
configure: add QNX support

r2256
Windows: Enable DEP and ASLR

r2255
x86inc: Set ELF hidden visibility for global constants

r2254
x86inc: Add cvisible macro for C functions with public prefix

This allows defining externally visible library symbols.

Signed-off-by: Diego Biurrun <diego@biurrun.de>

r2253
x86inc: rename program_name to private_prefix
Synced from libav.
The new name is more descriptive and will allow defining a separate public
prefix for externally visible library symbols.

r2252
x264.h: improve x264_encoder_reconfig documentation

r2251
Cosmetics: stricter definition of parameterless functions

r2250
Update "Install and compile x264" in doc/regression_test.txt

r2249
Fix possible non-determinism with mbtree + open-gop + sync-lookahead

Code assumed keyframe analysis would only pull one frame off the list; this
isn't true with open-gop.

r2248
x86: don't use the red zone on win64

r2247
x86-64: fix trellis asm with interlacing

Regression in r2145.
Assembly assumed array was [2][64] when it was actually [2][63].
Tiny (~0.1%) compression improvement.

r2246
x86-32: use simple nop codes for <= sse

The "CentaurHauls family 6 model 9 stepping 8" family of CPUs (flags:
fpu vme de pse tsc msr cx8 sep mtrr pge mov pat mmx fxsr sse up rng
rng_en ace ace_en) SIGILLs on long nop codes.

r2245
Bump dates to 2013

r2244
x86inc: Drop tzcnt workaround

It is no longer needed now that we've bumped the version requirement of yasm to 1.2.0.

r2243
AVX2/FMA3 version of mbtree_propagate
First AVX2 function for testing.
Bump yasm version to 1.2.0 for AVX2 support.

r2242
x86inc: Use VEX-encoded instructions in AVX functions
Automatically use VEX-encoding in AVX/AVX2/XOP/FMA3/FMA4 functions for all instructions that exists in a VEX-encoded version.
This change makes it easier to extend existing code to use AVX2.
Also add support for AVX emulation of a few instructions that were missing before.

r2241
x86inc: activate REP_RET automatically
Now RET checks whether it immediately follows a branch, so the programmer dosen't have to keep track of that condition.
REP_RET is still needed manually when it's a branch target, but that's much rarer.
The implementation involves lots of spurious labels, but that's ok because we strip them.

r2240
x86inc: support stack mem allocation and re-alignment in PROLOGUE
Use this in 8-bit loopfilter functions so they can be used if
there is no aligned stack (e.g. x86-32 MSVC or ICC 10.x).

r2239
Update config.guess and config.sub

r2238
Fix crash if the first frame is forced to a non-keyframe
This is obviously bad user input, but x264 shouldn't crash if it happens.

r2237
Fix build on ARM with binutils >= 2.23.51.0.6
GAS doesn't seem to like spaces in vld1 anymore, so remove those.

r2236
Fix pthread_join emulation on win32 and BeOS
Doesn't actually affect x264, but it's more correct.

r2235
Fix typo in r2222
Slightly wrong numbers in level table.

r2234
configure: fix gpac detection with -Wp,-D_FORTIFY_SOURCE=2

r2233
Solaris: use sysconf to get processor count
Solaris responds correctly to the same value as Cygwin, so let's use that.

r2232
lavf input: allocate AVFrame correctly
Allocate AVFrames correctly with avcodec_alloc_frame().
This caused crashes with newer libavcodecs that try to free frame extradata.

r2231
Fix crash when using libx264.dll compiled with ICL for X86_64

r2230
Fix possible issues with out-of-spec QP values
Fixes a possible regression in r2228.

r2229
Attempt to optimize PPS pic_init_qp in 2-pass mode
Small compression improvement; up to ~0.5% in extreme cases.
Helps more with small slice sizes (tiny resolutions or slice-max-size).
Note that this changes the 2-pass stats file format.

r2228
Improve slice header QP selection
Use the first macroblock of each slice instead of the last of the previous.
Lets us pick a reasonable initial QP for the first slice too.
Slightly improved compression.

r2227
Update level dpb size calculation to match newer H.264 spec
Doesn't actually change encoding behavior, but makes it more correct.
Warning messages should now be accurate at higher bit depths and non-4:2:0.
Technically, since it redefines x264_level_t, this is an API version increment.

r2226
Add support for the ffmpeg/vapoursynth high bit depth y4m extensions

r2225
x86inc: Rename 3dnow2 to 3dnowext
The name "3dnowext" is more common than "3dnow2". Doesn't affect x264.

r2224
x86inc: only define program_name if the macro is unset.
This allows overriding the value from outside the file.
This can be useful if x86inc.asm is used outside of x264.

r2223
Disable ARM NEON MRC CPU test for Apple devices
The Apple A6 CPU doesn't support performance counters, so this test caused a crash.

r2222
Fix crash with no-scenecut + mbtree

r2221
Fix reconfiguring to crf=0
Lossless mode can't currently be enabled mid-stream.

r2220
Fix ALIGNED_ARRAY_EMU macros on ICL
ICL's preprocessor doesn't handle it correctly.
This fix is similar to libav's fix in 0db2d9.

r2219
Fix use of deprecated av_close_input_file call

r2218
Fix pkg-config for dynamic vs static linking

r2217
Set libm in the configure script if the OS has libm
Prerequisite for another configure patch after this.
Idea copied from libpthread.

r2216
Enhance mb_info: add mb_info_update
This feature lets the callee know which decoded macroblocks have changed.

r2215
Fix mb_info_free with sliced threads
x264 would free mb_info before it was completely done using it.

r2214
Enhance nalu_process
Add the input frame opaque pointer to the arguments.
This makes it easier to use with multiple simultaneous x264 encodes.

r2213
Improve mb_info constant mb optimization
Allow fast skipping even if the pskip MV isn't zero.

r2212
Export the average effective CRF of each frame
Useful to judge the resulting quality of a frame when VBV is enabled.

r2211
Remove special-casing for OpenBSD pthread handling
Previously it was policy to use -pthread, but OpenBSD now recommends -lpthread.
its been libpthread anyway and policy has changed to stop using -pthread.

r2210
x86inc: automatically insert vzeroupper for YMM functions
Backported from libav.

r2209
Free user supplied data when deleting a frame
This eliminates a memory leak when calling x264_encoder_close.

r2208
Revert r2204
People don't seem to like this so I'm just going to get rid of it.

r2207
Faster predictor checking with subme<3
Fix a typo that made an early-skip less effective.
Avoid a relatively unpredictable branch.
Slightly changed output due to the typo-fix.
~50 cycles faster on Core i7.

r2206
Try 8x8 transform analysis even when sub8x8 partitions are present
Turn off the sub8x8 partitions, try it, and turn them back on if it didn't help.
Small compression improvement with p4x4 on (~0.1-0.5%).
Also update related comments.

r2205
Support changing resolutions between passes with macroblock-tree
Implement a basic separable bilinear filter to rescale the quantizer offsets.
Structure inspired by swscale, but floating-point instead of fixed-point.
Not as optimized as it could be, but it's quite fast already.

Example compression penalties on a 720p video game recording:
First pass with 720p and second as 480p: ~-1.5% (vs. same res)
First pass with 480p and second as 720p: ~-3% (vs. same res)

r2204
Print elapsed time in encoding progress indicator

r2203
Cap ratecontrol predictor parameters
Limits VBV mispredictions after long periods of relatively constant video.

r2202
x86inc: import patches from libav
Allow manual invocation of WIN64_SPILL_XMM even under INIT_MMX
SSE version of mova is movaps rather than movdqa.
YMM version of movnta.
Add mp size for named arguments.
Fix DEFINE_ARGS when used outside of a cglobal.
Define a few more cpuflags.
3-argument wrappers for a few more instructions.

r2201
Fix crash with --fps 0
Fix some integer overflows and check input parameters better.
Also fix incorrect type specifiers for demuxer info printing.

r2200
Threaded lookahead

Split each lookahead frame analysis call into multiple threads. Has a small
impact on quality, but does not seem to be consistently any worse.

This helps alleviate bottlenecks with many cores and frame threads. In many
case, this massively increases performance on many-core systems. For example,
over 100% faster 1080p encoding with --preset veryfast on a 12-core i7 system.
Realtime 1080p30 at --preset slow should now be feasible on real systems.

For sliced-threads, this patch should be faster regardless of settings (~10%).

By default, lookahead threads are 1/6 of regular threads. This isn't exacting,
but it seems to work well for all presets on real systems. With sliced-threads,
it's the same as the number of encoding threads.

r2199
Add support for RGB formats in bit-depth conversion filter

r2198
Fix some bugs in mb_info code

r2197
Add mb_info API for signalling constant macroblocks
Some use-cases of x264 involve encoding video with large constant areas of the frame.
Sometimes, the caller knows which areas these are, and can tell x264.
This API lets the caller do this and adds internal tracking of modifications to macroblocks to avoid problems.
This is really only suitable without B-frames.
An example use-case would be using x264 for VNC.

r2196
Faster chroma weight cost calculation

New assembly function with SSE2, SSSE3 and XOP implementations for calculating absolute sum of differences.

r2195
Add Level 5.2 support

r2194
Eradicate all mention of Extended Profile
x264 never supported it and never will because nobody uses it.

r2193
Fix disabling of mbtree when using 2pass encoding and zones

r2192
configure: force select -mXX gcc option for i386/x86-64
Makes multilib compilation more convenient.

r2191
Update config.guess and config.sub
Adds support for a bunch of targets, including:
aarch64 (armv8)
arm-linux-androideabi

r2190
configure: correct use of RC variable and add --extra-rcflags

r2189
ICL/MSVS: Fix shared library generation and usage
MSVS requires exported variables to be declared with the DATA keyword, and requires that imported variables be declared with dllimport.
This does not fix x264 cli being unable to use a shared library built by ICL however.

r2188
Fix intra-refresh + hrd

r2187
Fix frame input colorspace check

r2186
Fix comment in deblock.c
The code does, in fact, handle CAVLC+8x8dct correctly already.

r2185
Fix sliced-threads ratecontrol bug
Was using qp instead of qscale; could cause NANs (not to mention less accurate results).


r2184
Fix clobbering of mutex/cvs
Regression in r2183.
Bizarrely seemed to work on many platforms, but crashed on win64 and may have been slower.
Only affected sliced threads during encoding, but could cause crashes on x264 encoder close even without sliced threads.

r2183
Sliced-threads: do hpel and deblock after returning
Lowers encoding latency around 14% in sliced threads mode with preset superfast.
Additionally, even if there is no waiting time between frames, this improves parallelism, because hpel+deblock are done during the (singlethreaded) lookahead.
For ease of debugging, dump-yuv forces all of the threads to wait and finish instead of setting b_full_recon.

r2182
Add full-recon API option
Fully reconstruct frames even without dump-yuv.

r2181
x86inc: switch to amdnops
Recent AMD CPUs' instruction decoders choke horribly on extremely long nops (i.e. with 4 prefixes).
Won't affect much, since we don't use ALIGN much.

r2180
BMI1 decimate functions
Intel was nice enough to make tzcnt equal to "rep bsf", which is backwards-compatible.
This means we don't actually have to add new functions to make it work.

r2179
Minor asm changes

r2178
Add row-reencoding support to VBV for improved accuracy
Extremely accurate, possibly 100% so (I can't get it to fail even with difficult VBVs).
Does not yet support rows split on slice boundaries (occurs often with slice-max-size/mbs).
Still inaccurate with sliced threads, but better than before.

r2177
Abstract bitstream backup/restore functions
Required for row re-encoding.

r2176
Add an small per-MB cost penalty for lowres
Helps avoid VBV predictors going nuts with very low-cost MBs.
One particular case this fixes is zero-cost MBs: adaptive quantization decreases the QP a lot, but (before this patch), no cost penalty gets factored in for this, because anything times zero is zero.

r2175
Remove explicit run calculation from coeff_level_run
Not necessary with the CAVLC lookup table for zero run codes.

r2174
Export PSNR/SSIM in x264 API

r2173
x86inc: support yasm -f win64
Not necessary for x264, as -m amd64 already does the right thing, but used by external users of x86inc.

r2172
Fix incorrect zero-extension assumptions in x86_64 asm
Some x264 asm assumed that the high 32 bits of registers containing "int" values would be zero.
This is almost always the case, and it seems to work with gcc, but it is *not* guaranteed by the ABI.
As a result, it breaks with some other compilers, like Clang, that take advantage of this in optimizations.
Accordingly, fix all x86 code by using intptr_t instead of int or using movsxd where neccessary.
Also add checkasm hack to detect when assembly functions incorrectly assumes that 32-bit integers are zero-extended to 64-bit.

r2171
Fix possible alignment crash when linking from MSVC
x264_cavlc_init needs to be stack-aligned now.

r2170
Fix rare overflow in 10-bit intra_satd_x3_16x16 asm

r2169
ICL: fix out of tree building and resource file usage on Windows

r2168
Add error handling for out-of-tree build

r2167
Fix RGB colorspace input
BGR/BGRA input was correct.

r2166
Fix interlaced + extremal slice-max-size
Broke if the first macroblock in the slice exceeded the set slice-max-size.

r2165
Fix regression in r2141
Broke register preservation in x264_cpu_cpuid and x264_cpu_xgetbv.
Did not cause any problems.

r2164
TBM, AVX2, FMA3, BMI1, and BMI2 CPU detection support
TBM and BMI1 are supported by Trinity/Piledriver.
The others (and BMI1) will probably appear in Intel's upcoming Haswell.
Also update x86inc with AVX2 stuff.

r2163
x86inc: add TAIL_CALL macro to abstract a common asm idiom

r2162
Minor asm optimizations/cleanup

r2161
Clean up and optimize weightp, plus enable SSSE3 weight on SB/BDZ
Also remove unused AVX cruft.

r2160
XOP frame_init_lowres
Covers both 8-bit and 16-bit, ~5-10% faster on Bulldozer.

r2159
XOP 8x8 zigzags
Field: 35(mmx) ->16(xop) cycles
Frame: 32(ssse3)->20(xop) cycles

r2158
AVX 32-bit hpel_filter_h
Faster on Sandy Bridge.
Also add details on unsuccessful optimizations in these functions.

r2157
x86inc: add high halfword register support
Might be useful in a few cases.

r2156
Change %ifdef directives to %if directives in *.asm files
This allows combining multiple conditionals in a single statement.

r2155
Use TV range algorithm for bit-depth conversions
Such sources are more common, so better to be correct for the common case.
This also produces less error for the case of full range than the previous algorithm produced for the case of TV range.

r2154
Bump dates to 2012

r2153
Add Windows resource file
Displays version info in Windows Explorer.

r2152
Fix win32 pthread_cond_signal
Isn't used by x264 currently, so didn't cause a problem.
Fix backported from libav.

r2151
ARM: align asm functions to 4 bytes.
Some linkers apparently fail to correctly align ARM functions when mixing with Thumb code.

r2150
Fix normalization of colorspace when input is packed YUV 4:2:2

r2149
Force keyint-min 1 with Blu-ray
Fixes an issue with referencing across I-frames that's prohibited in Blu-ray for some godforsaken reason.

r2148
Fix crash in --demuxer y4m with unsupported colorspace

r2147
Fix overread/possible crash with intra refresh + VBV

r2146
Fix trellis 2 + subme >= 8
Trellis didn't return a boolean value as it was supposed to.
Regression in r2143-5.

r2145
CABAC trellis opts part 4: x86_64 asm
Another 20% faster.
18k->12k codesize.

This patch series may have a large impact on encoding speed.
For example, 24% faster at --preset slower --crf 23 with 720p parkjoy.
Overall speed increase is proportional to the cost of trellis (which is proportional to bitrate, and much more with --trellis 2).

r2144
CABAC trellis opts part 3: make some arrays non-static

r2143
CABAC trellis opts part 2: C optimizations

Hoist the branch on coef value out of the loop over node contexts.
Special cases for each possible coef value (0,1,n).
Special case for dc-only blocks.
Template the main loop for two common subsets of nodes, to avoid a bunch of branches about which nodes are live.
Use the nonupdating version of cabac_size_decision in more cases, and omit those bins from the node struct.
CABAC offsets are now compile-time constants.
Change TRELLIS_SCORE_MAX from a specific constant to anything negative, which is cheaper to test.
Remove dct_weight2_zigzag[], since trellis has to lookup zigzag[] anyway.

60% faster on x86_64.
25k->18k codesize.

r2142
CABAC trellis opts part 1: minor change in output
Due to different tie-break order.

r2141
x86inc improvements for 64-bit

Add support for all x86-64 registers
Prefer caller-saved register over callee-saved on WIN64
Support up to 15 function arguments

r2140
High bit depth SSE2/AVX add8x8_idct8 and add16x16_idct8
From Google Code-In.

r2139
MMX/SSE2/AVX predict_8x16_p, high bit depth fdct8
From Google Code-In.

r2138
XOP 8-bit fDCT
Use integer MAC for one of the SUMSUB passes. About a dozen cycles faster for 16x16.

r2137
High bit depth intra_sad_x3_4x4
From Google Code-In.

r2136
Use a large LUT for CAVLC zero-run bit codes
Helps the most with trellis and RD, but also helps with bitstream writing.
Seems at worst neutral even in the extreme case of a CPU with small L2 cache (e.g. ARM Cortex A8).

r2135
High bit depth intra_sad_x3_8x8, intra_satd_x3_4x4/8x8c/16x16
Also add an ACCUM macro to handle accumulator-induced add-or-swap more concisely.

r2134
MMX 10-bit predict_8x8c_h and predict_8x16c_h
From Google Code-In.

r2133
Some MBAFF x86 assembly functions.
deblock_chroma_420_mbaff, plus 422/422_intra_mbaff implemented using existing functions.
From Google Code-In.

r2132
More ARM NEON assembly functions
predict_8x8_v, predict_4x4_dc_top, predict_8x8_ddl, predict_8x8_ddr, predict_8x8_vl, predict_8x8_vr, predict_8x8_hd, predict_8x8_hu.
From Google Code-In.

r2131
More 4:2:2 asm functions
High bit depth version of deblock_h_chroma_422.
Regular and high bit depth versions of deblock_h_chroma_intra_422.
High bit depth pixel_vsad.
SSE2 high bit depth and MMX 8-bit predict_8x8_vl.
Our first GCI patch this year!

r2130
SSE2 and SSSE3 versions of sub8x16_dct_dc
Also slightly faster sub8x8_dct_dc

r2129
Resize filter updates
Use AVPixFmtDescriptors to pick the most compatible x264 csp for any pixel format.
Fix deprecated use of av_set_int.
Now requires libavutil >= 51.19.0

r2128
Add out-of-tree build support

r2127
Limit SSIM to 100db
Avoids floating point error for infinite SSIM (lossless).

r2126
Fix wrong conditional inclusion of inttypes.h
inttypes.h is required by encoder/ratecontrol.c for SCNxxx macros, and HAVE_STDINT_H does not imply having inttypes.h.
stdint.h is a subset of inttypes.h, but this isn't enough for x264.
This change fixes building x264 with Android's toolchain.

r2125
Fix crash with sliced threads and input height <= 112

r2124
Fix loading custom 8x8 chroma quant matrices in 4:4:4

r2123
Fix PCM cost overflow

r2122
Fix overflow in 8-bit x86 vsad asm function

r2121
Fix crash in --fullhelp when compiled against recent ffmpeg
Don't assume all pixel formats have a description.

r2120
Fix regression in r2118
Broke trellis with i16x16 macroblocks.

r2119
Modify MBAFF chroma deblock functions to handle U/V at the same time
Allows for more convenient asm implementations.

r2118
CABAC trellis optimizations: use SIMD quant
Significant speed increase, minor change in output due to rounding.

r2117
YUV range detection and support for x264CLI
Two new options: --input-range and --range.
--input-range forces the range of the input in case of misdetection; auto by default.
-- range sets the range of the output; x264cli will convert if necessary, TV by default.
--fullrange is now removed as a CLI option (but the libx264 API is unchanged).

r2116
Pass through user data

r2115
Remove unpredictable branch in CABAC dqp

r2114
x86inc: AVX symmetry optimization
3-arg AVX ops with a memory arg can only have it in src2,
whereas SSE emulation of 3-arg prefers to have it in src1 (i.e. the move).
So, if the op is symmetric and the wrong one is memory, swap them.
Eliminates redundant moves in some cases when using 3-operand without AVX with memory arguments.
Also fix movss and movsd in some cases, and flag shufps correctly as float.

r2113
checkasm: shut up gcc warnings, fix some naming of functions in results

r2112
checkasm: fix build on ARM
Because of how ALIGNED_ARRAY_16 is defined on ARM, array initialisers cannot be used here. Use memset() instead.

r2111
Improve makefile rules
Remove the need for "make clean" after most reconfigures.

r2110
Mark some local functions as static, cosmetics

r2109
Fix crash if timecode file opening fails

r2108
Configure: force PIC for shared build on PARISC and MIPS

r2107
Improve yasm version check
Previous check allowed certain earlier versions that weren't fully compatible.

r2106
Add fenc prefetching to adaptive quant
Many fewer cache misses, faster adaptive quant.

r2105
Split prefetch_fenc between colorspaces
Add 4:2:2 version.

r2104
Some more 4:2:2 x86 asm
coeff_last8, coeff_level_run8, var2_8x16, predict_8x16c_dc, satd_4x16, intra_mbcmp_8x16c_x3, deblock_h_chroma_422

r2103
Remove obsolete versions of intra_mbcmp_x3
intra_mbcmp_x3 is unnecessary if x9 exists (SSSE3 and onwards).

r2102
SSSE3/SSE4/AVX 9-way fully merged i8x8 analysis (sa8d_x9)
x86_64 only for now, due to register requirements (like sa8d_x3).

i8x8 analysis cycles (per partition):
penryn sandybridge bulldozer
616->600 482->374 418->356 preset=faster
892->632 725->387 598->373 preset=medium
948->650 789->409 673->383 preset=slower

r2101
SSSE3/SSE4/AVX 9-way fully merged i8x8 analysis (sad_x9)
~3 times faster than current analysis, plus (like intra_sad_x9_4x4) analyzes all modes without shortcuts.

r2100
Merge i4x4 prediction with intra_mbcmp_x9_4x4
Avoids a redundant prediction after analysis.

r2099
Inline i4x4/i8x8 encode into intra analysis
Larger code size, but faster.

r2098
Initial XOP and FMA4 support on AMD Bulldozer
~10% faster Hadamard functions (SATD/SA8D/hadamard_ac) plus other improvements.

r2097
ARM: update NEON chroma deblock functions to NV12 pixel format

r2096
Add /usr/lib/{64/}values-xpg6.o to $LDFLAGS on Solaris
This is required for POSIX.1-2001 compliance.

r2095
Fix linker test for -Bsymbolic
The Solaris linker only accepts -Bsymbolic for objects compiled in dynamic mode (i.e. shared objects), so pass -shared to gcc.
Additionally, for x86_32 unresolved textrels cause a linker error so mark the .text section as 'impure'.

r2094
Add $SOFLAGS to exported SOFLAGS make variable

r2093
Allow setting a chroma format at compile time
Gives a slight speed increase and significant binary size reduction when only one chroma format is needed.

r2092
Improve profile help
List high422/high444 profiles, and don't show non-high-bit-depth profiles in high bit depth builds.

r2091
Fix infinite loop parsing TDecimate Mode 3 timecode v1 files

r2090
Fix some integer overflows/signedness errors found by IOC
The only real bug here is in slicetype.c, which may or may not affect real encodes.

r2089
Fix pixel_var2 with 4:2:2 encoding
Might have caused artifacts or suboptimal chroma compression.

r2088
Fix chroma intra analysis in 4:4:4 lossless mode

r2087
Fix use of uninitialized MVs in sub8x8 RDO

r2086
Fix detection of Alpha CPU arch on alphaev67

r2085
Optimize x86 asm for Intel macro-op fusion
That is, place all loop counter tests right before their conditional jumps.

r2084
CAVLC: clean up and restructure
Somewhat faster CAVLC and RD bit-counting.

r2083
CABAC: clean up and restructure
Somewhat faster CABAC and RD bit-counting.

r2082
Some initial 4:2:2 x86 asm

r2081
4:2:2 encoding support

r2080
SSSE3/SSE4 9-way fully merged i4x4 analysis (sad/satd_x9)

i4x4 analysis cycles (per partition):
penryn sandybridge
184-> 75 157-> 54 preset=superfast (sad)
281->165 225->124 preset=faster (satd with early termination)
332->165 263->124 preset=medium
379->165 297->124 preset=slower (satd without early termination)

This is the first code in x264 that intentionally produces different behavior
on different cpus: satd_x9 is implemented only on ssse3+ and checks all intra
directions, whereas the old code (on fast presets) may early terminate after
checking only some of them. There is no systematic difference on slow presets,
though they still occasionally disagree about tiebreaks.

For ease of debugging, add an option "--cpu-independent" to disable satd_x9
and any analogous future code.

r2079
Faster intra_mbcmp_x3 for versions without dedicated asm
Select asm subroutines more intelligently in the wrapper functions.

r2078
Optimize x86 intra_predict_4x4 and 8x8

High bit depth Penryn, Sandybridge cycles:
4x4_ddl: 11->10, 9-> 8
4x4_ddr: 15->13, 12->11
4x4_hd: , 15->12
4x4_hu: , 14->13
4x4_vr: 15->14, 14->12
8x8_ddl: 32->19, 19->14
8x8_ddr: 42->19, 21->14
8x8_hd: , 15->13
8x8_hu: 21->17, 16->12
8x8_vr: 33->19,

8-bit Penryn, Sandybridge cycles:
4x4_ddr: 24->15,
4x4_hd: 24->16,
4x4_hu: 23->15,
4x4_vr: 23->16,
4x4_vl: 10-> 9,
8x8_ddl: 23->15,
8x8_hd: , 17->14
8x8_hu: , 15->14
8x8_vr: 20->16, 17->13

r2077
Use realistic alignment for intra pred benchmarks in checkasm

r2076
Fix frame packing SEI with --frame-packing 0
According to the spec, when frame_packing_arrangement_type is equal to 0, quincunx_sampling_flag shall be equal to 1.

r2075
Fix install/uninstall shared libs if SYS is WINDOWS/CYGWIN

r2074
Add Hurd support to configure

r2073
Optimize x86 intra_satd_x3_*
~7% faster.

r2072
Optimize x86 intra_sa8d_x3_8x8
~40% faster.
Also some other minor asm cosmetics.

r2071
Scale interlaced refs/mvs for mvr predictors
Slightly improves compression and fixes a Valgrind error.

r2070
Optimize predict_8x8_filter and incidentally remove a valgrind false-positive

r2069
Don't override flat SSE2 dequant functions with non-flat AVX ones
Slightly faster.

r2068
Shut up some valgrind false-positives

r2067
Avoid some unnecessary allocations with B-frames/CABAC off

r2066
Fix typo in p8x8 RD analysis
Passed wrong idx to trellis.

r2065
Fix invalid memory accesses in x86 lowres_init when width <= 16

r2064
Fix intermediate conversion for YUVJ* pixfmts with 4:4:4 encoding

r2063
Fix pic_out returned by x264_encoder_encode with 4:4:4

r2062
Fix zeroing of mvr predictors in bskip blocks

r2061
Fix: chroma planes for weightp analysis were not initted if U early-terminates and V doesn't.

r2060
Expand borders before chroma weightp analysis
Prevents mc from using uninitialized source pixels.

r2059
Another 4:4:4 chroma weightp bug fix

r2058
Fix typo in help

r2057
Improve support for varying resolution between passes
Should give much better quality, but still doesn't support MB-tree yet.
Also check for the same interlaced options between passes.
Various minor ratecontrol cosmetics.

r2056
asm cosmetics: base-4 constants for shuffles

r2055
Enable some existing asm functions that were missing function pointers
pixel_ads1_avx, predict_8x8_hd_avxx
High bit depth mc_copy_w8_sse2, denoise_dct_avx, prefetch_fenc/ref, and several pixel*sse4.

r2054
Remove some unused, broken, and/or useless functions
Unused frame_sort.
Unused x86_64 dequant_4x4dc_mmx2, predict_8x8_vr_mmx2.
Unused and broken high_depth integral_init*h_sse4, optimize_chroma_*, dequant_flat_*, sub8x8_dct_dc_*, zigzag_sub_*.
Useless high_depth dequant_sse4, dequant_dc_sse4.

r2053
asm cosmetics: merge all the variants of ABS macros

r2052
asm cosmetics part 2
were split out of the cpuflags commit because they change the output executable.

r2051
asm cosmetics: INIT_MMX/XMM/YMM now support a cpuflags argument

Reduces the number of macro args that need to be passed around.
Allows multiple implementations of a given macro (e.g. PALIGNR) to check
cpuflags at the location where the macro is defined, instead of having
to select implementations by %define at toplevel.
Remove INIT_AVX, as it's replaced by "INIT_XMM avx".

This commit does not change the stripped executable.

r2050
Import x86inc.asm patches from libav

r2049
Cosmetics: s/mmxext/mmx2/

r2048
Fix two bugs in 4:4:4 chroma weightp analysis
Caused slightly worse compression.

r2047
Fix "--asm avx"
Previously required "--asm sse2fast,fastshuffle,sse4.2,avx".

r2046
Re-add support for glibc <2.6, which doesn't have CPU_COUNT

r2045
Avoid using deprecated libavformat functions
Replace av_find_stream_info with avformat_find_stream_info.
Now requires libavformat 53.3.0 or newer.

r2044
Use assembly versions of some deblocking functions in MBAFF

r2043
Move X264_VERSION / X264_POINTVER from config.h to x264_config.h
This makes them available to external programs as part of the public API.

r2042
Fix padding bug in x264_expand_border_mbpair

r2041
Timecode parsing: Add missing initialization
Fix crash when failed to parse timecode file before malloc pts.
Fix detection of user timebase considered to be exceeding H.264 maximum.

r2040
Fix crash with high bitdepth 4:2:0 input

r2039
x86 asm cosmetics
Use FDEC_STRIDEB where appropriate.

r2038
Fix a bug in lossless sub-8x8 RD
Caused crashes in rare cases with lossless encoding. Regression in 4:4:4.

r2037
Improved p8x4/4x8 search decision
Use the same thresholding as for p16x8/8x16.
Does p8x4/4x8 search more often, for a small compression improvement.

r2036
Add --subme 11, which disables all early terminations in analysis
Necessary for a future trellis mode decision/motion estimation patch.
Also add the slowest presets to the regression test.

r2035
Some trivial changes to RD thresholds
The output-changing portion of the next patch.

r2034
Allow setting a wider range of chroma QP offsets
This allows use of the full range of chroma QP offsets, even in combination with the automatic psy-based adjustments.

r2033
Optimize macroblock_deblock_strength, add more early terminations

r2032
Function-pointerify MBAFF deblocking functions

r2031
Clean up MBAFF deblocking code

r2030
Optimize frame_deblock_row

r2029
Shrink two arrays

r2028
Add support for the new (4:4:4) colorspaces to x264_picture_alloc

r2027
Various cosmetics

r2026
Improve configure help

r2025
Use $optarg for some configure options

r2024
Linux x264_cpu_num_processors(): use glibc macros
The cpu_set_t structure is considered opaque.
Also handle sched_getaffinity() error case if "cpusetsize is smaller than the size of the affinity mask used by the kernel."

r2023
Fix spurious "stream properties changed" with --seek option on some inputs

r2022
Fix use of deprecated libavcodec functions
Replace avcodec_open with avcodec_open2. Now requires libavcodec 53.6.0 or newer.

r2021
Fix nalu_process callback with HRD

r2020
Fix incorrect chroma swap for some input pixfmts

Problem occurred if pixfmt of lavf/ffms input was PIX_FMT_RGB24 or PIX_FMT_YUV444P.

r2019
Fix resize filter crash with YUVJ* input pixfmt

r2018
RGB encoding support
Much less efficient than YUV444, but easy to support using the YUV444 framework.

r2017
4:4:4 encoding support

r2016
Properly weight slice header lambda in chroma weightp analysis

r2015
Better x86 high bit depth predict_8x8c_p
Avoid the need to check for corner cases by reordering arithmetic.
Also make a minor optimization to high bit depth predict_16x16_p.

r2014
Eliminate extra layer of indirection for sps/pps references
Also remove poc type 1 support (it didn't work anyways) to reduce sps size.

r2013
Fix SSIM calculation with sliced threads

r2012
Avoid possible NaNs in B-frame output stats

r2011
ARM: do not override the toolchain default for FPU ABI

r2010
Fix link errors with libswscale/libavutil as shared libraries

r2009
Fix deprecation in libavformat usage
Replace av_open_input_file with avformat_open_input. Now requires libavformat 53.2.0 or newer.

r2008
Fix various issues with VBV+threads
Eliminate the race condition with interframe row predictors and threads.
Recalculate frame_size_estimated at the end of a frame, for improved update_vbv_plan.
Some cosmetics.

r2007
Fix MBAFF row VBV ratecontrol
Reverts most of r1984 and implements a much simpler solution.

r2006
Make ratecontrol_mb less slow

r2005
Resize filter updates
Fix use of deprecated sws_getContext.
Fix uses of sws_format_name.
Fix stream change warning not occurring on the first resolution change.
Drop cpu detection, as it is now performed internally by swscale.
Update swscale version requirements.

r2004
AVX mbtree_propagate
Up to ~20-30% faster than SSE2 on Sandy Bridge.

r2003
Use -vsync 0 with ffmpeg regression test

r2002
Inline emms instructions on x86 if possible

r2001
Make left_index_table const
Should allow for some missed compiler optimizations in macroblock_cache_load.

r2000
Make --profile main/baseline force off CQMfile

r1999
Fix VBV bug caused by zero i_row_satd value for first and last row

r1998
Fix crash with VBV + forced QP

r1997
Fix VBV bug with MinCR limit

r1996
Fix bitstream reallocation with slice-max-size + MBAFF

r1995
Improve build system capabilities
Make static lib and CLI optional.
Support linking CLI to system libx264.
Don't strip by default, to match GNU packaging guidelines.

r1994
Slightly speed up x86 CABAC asm
Also make some various cleanups.

r1993
Faster pixel_memset
~4x faster.
Also inline plane_expand_border for improved constant propagation.

r1992
Add checkasm tests for memcpy_aligned, memzero_aligned
Also make memcpy_aligned support sizes smaller than 64.

r1991
MBAFF: Add regularization to VSAD metric
Bias towards the MBAFF decisions made in neighboring mb pairs.
~2% better compression on a random 1080i HDTV source.

r1990
MBAFF: Improve handling of bottom row mod32 padding
Force skip on any MBs entirely outside the frame
If an mb pair in the bottom row is chosen to be progressive, re-pad the bottom rows progressively.

r1989
MBAFF: Add frame/field MB stats

r1988
MBAFF: Template direct spatial

r1987
MBAFF: Template cache_load and cache_load_neighbours

r1986
MBAFF: Make interlaced support a compile time option

r1985
MBAFF: Don't call zigzag_init for every mb

r1984
MBAFF: Modify ratecontrol to update every two rows

r1983
MBAFF: Add support for slice-max-size

Also add slice-max-size to the regression tests.

r1982
MBAFF: Add support for slice-max-mbs

r1981
MBAFF: Adaptive quantization

Compute energy for interlaced and progressive choices and pick the least.

r1980
MBAFF: Enable adaptive MBAFF with VSAD decision

r1979
MBAFF: Create a VSAD DSP function

x86 assembly by Jason Garrett-Glaser. This gives roughly 30x speed
increase over the C version.

r1978
MBAFF: Direct spatial

r1977
MBAFF: Direct temporal

r1976
MBAFF: Calculate bipred POCs

Need to calculate two tables for the cases where the current macroblock is
progressive or interlaced as refs are calculated differently for each.

r1975
MBAFF: Use both left macroblocks for ref_idx calculation

r1974
MBAFF: First edge deblocking

r1973
MBAFF: Implement left edge deblocking functions

r1972
MBAFF: Add extra data to the deblock strength structure

r1971
MBAFF: Deblocking support

r1970
MBAFF: Move common code from deblock functions

r1969
MBAFF: Add mbaff deblock strength calculation

Move call to deblock_strength to x264_macroblock_deblock_strength to
keep deblock strength calculation in one place.

r1968
MBAFF: Update x264_cabac_mvd_sum_mmxext to work with larger MVDs.

Author: Loren Merritt <pengvado@akuvian.org>

r1967
MBAFF: Clamp MVDs to 66 instead of 33

r1966
MBAFF: CABAC encoding of skips

r1965
MBAFF: Track what interlace decision the decoder is using

r1964
MBAFF: Fix mvy bounds

Fix MV clipping

r1963
MBAFF: Copy deblocked pixels to other plane

r1962
MBAFF: Disallow skip where predicted interlace flag would be wrong

r1961
MBAFF: Inter support

r1960
MBAFF: Neighbour calculation

Back up intra borders correctly and make neighbour calculation several times longer.

r1959
MBAFF: Store references to the two left macroblocks

r1958
MBAFF: Store left references in a table

r1957
MBAFF: Disable adaptive MBAFF when subme 0 is used

r1956
MBAFF: Save interlace decision for all macroblocks

r1955
Fix bug in NAL buffer resizing
Also properly terminate if NAL buffer resizing fails.

r1954
Fix zone bitrate multiplier and QP forcing in 2-pass mode
Previously zone changes could affect frames outside of the given frame range (around 20 neighboring frames).

r1953
Use float constants in qp rounding
Slight performance improvement and fixes slight difference in output between gcc 3.4 and 4.5.

r1952
Fix bugs with ratecontrol reconfiguration
Initialization of some parameters was missed or wasn't synchronized with other threads

r1951
More validation of input parameters
This fixes a crash with --me umh and insane values of --me-range.

r1950
Fix bug in --b-adapt 2 with --rc-lookahead >248
Problem caused by buffer overflow in strcpy.

r1949
Check for invalid pixfmts in lavf demuxer

r1948
in r1944
roke sliced-threads + slice-max-size/slice-max-mbs.

r1947
Precalculate CABAC initialization contexts
Slightly faster encoding with lots of slices.

r1946
Avoid redundant log2f calls in mv cost initialization
Saves around 100 million clock cycles on x264 init.

r1945
CABAC residual: cleanup and optimizations
Also kill all Hungarian notation while we're at it.
Trim an instruction off cabac_encode_bypass.

r1944
Validate input parameters more carefully
Get rid of redundant warnings upon encoder_reconfig calls.
Also avoid encoder_reconfig turning off psy_rd/trellis.

r1943
Fix VFR MB-tree to work as intended
Should improve quality with FPSs much larger or smaller than 25.

r1942
Support more recent GPAC versions

r1941
Fix decoder desync with positive --chroma-qp-offset and zones

r1940
Use AVMEDIA_TYPE_VIDEO instead of deprecated CODEC_TYPE_VIDEO

Fixes build with lavf/lavc 53.

r1939
Force pic-struct for Blu-ray compat + fake-interlaced

r1938
Fix open-gop with no-psy

r1937
Fix build with disabled asm

r1936
Improve Blu-ray compliance
Use dec_ref_pic_marking SEIs to repeat B-ref referencing information.
Don't allow B-frames to reference frames outside their minigop.

r1935
Consolidate Blu-ray hacks into --bluray-compat
This option is now required for Blu-ray compatibility.
--open-gop bluray is now gone (using bluray-compat and open-gop implies a Blu-ray compatible open-gop).
This option doesn't automatically enforce every aspect of Blu-ray compatibility (e.g. resolution, framerate, level, etc).

r1934
Add SSE support to rectangle.h for 16-byte stores
Uses GCC vector intrinsics; may be suboptimal on particularly old GCC versions.

r1933
Do not force Intel Compiler to target pre-mmx architecture for x86
Caused a speed penalty against gcc equivalents.

r1932
Warn users when using --(psnr|ssim) without --tune (psnr|ssim)
This is a counter to the proliferation of incredibly stupid psnr/ssim "benchmarks" of x264 in which the benchmarker conveniently "forgot" --tune psnr/ssim, crippling x264 in the test.

r1931
Remove redundant mbcmp calls in weightp analysis

r1930
Use integer math for filler size calculation

r1929
Disable progress for FFMS input with --no-progress

r1928
Fix bug in intra-refresh ratecontrol
Row SATDs were slightly incorrect.

r1927
Cosmetics: fix some signedness issues found by -Wsign-compare

r1926
Minor fixes
Fix a comment typo.
Align an array properly.
Make x264_scan8 unsigned: saves a bunch of movsxd instructions on x86_64.

r1925
Improve C99 support checks in configure
Fixes configuration with Intel compiler in some cases.

r1924
Eliminate the possibility of CAVLC level code overflow
Instead, if it happens, just re-encode the MB at higher QPs until it fits.

r1923
x86 SIMD versions of optimize_chroma_dc
SSE2/SSSE3/SSE4/AVX implementations.
About 3x faster.

r1922
Add Altivec version of mc_weight

r1921
Add Altivec versions of mbcmp_x functions
These aren't merged versions, they just call the existing asm code.
A merged implementation would of course be faster.

r1920
Recognize cygwin as itself when not targeting mingw
Also fix broken thread detection on cygwin.

r1919
Patch Intel's CPU dispatcher
Reduces Intel Compiler's bias against non-Intel CPUs.

Big thanks to Agner for the original information on how to do this.

r1918
Intel Compiler support

Big thanks to David Rudie, the original author of this patch.

r1917
Cosmetics: make struct definition braces consistent

r1916
Fix restoring of console title on Windows with ffms indexing

r1915
Fix possible buffer overflow in mp4 muxer

r1914
Remove inline asm syntax not supported by LLVM's assembler
Doesn't affect compiled output outside of LLVM.

r1913
Fix 10L in r1912
SSSE3 code got used in MMX/SSE2 and vice versa (in hpel).

r1912
Add AVX functions where 3+ arg commands are useful

r1911
Frame-packing 3D: don't place scenecuts on right views
Caused problems for some players.

r1910
Improve slice-max-size handling of escape bytes
More accurate but a bit slower. Helps deal with a few obnoxious corner cases where the current algorithm failed.

r1909
Use bs_write1 wherever possible in header writing

r1908
Remove obsolete mvcost init code

r1907
Fix memory leak on encoder close if not all frames are flushed

r1906
Fix signedness bug in CPU detection
Luckily didn't affect anything due to C signedness rules.

r1905
Fix dumb bug caused by stray semicolon
Caused noise reduction to run incorrectly in part of RD, but probably had no effect.

r1904
Fix malloc of zero size
Caused x264 to fail with some settings on systems that return a NULL pointer for malloc(0), like Solaris.

r1903
Fix crash in mp4 muxer after failure of x264_encoder_open

r1902
Fix shadowed variable warning in ffms.c

r1901
Fix some Intel compiler warnings

r1900
Fix 10L in r1886
Aspect ratio can't be set before SPS is initted.

r1899
Improve update interval of x264cli progress information
Now updates every 0.25s instead of every N frames.

r1898
Windows: restore previous console title after encoding
MSDN docs claim that SetConsoleTitle's effect is reverted when the process terminates, but this doesn't always work properly.
Accordingly, manually revert the console title at the end of encoding.

r1897
Allow WEIGHTP_FAKE in interlaced mode
It seems to work fine as-is even though real weightp doesn't support interlacing yet.

r1896
Output pic struct information in libx264 API

r1895
Enable FastShuffle on Penryn and Nehalem CPUs without SSE4

r1884
Hotfix for some bugs in VBV emergency

r1883
Fix warnings in cpu.c

r1882
Check for OS AVX support in addition to CPUID
Even if not using ymm registers, AVX operations will cause SIGILLs on unsupported OSs.
On Windows, AVX is only available on Windows 7 SP1 or later.

r1881
VBV emergency mode
Allow ratecontrol to select "quantizers" above the maximum.
These "quantizers" progressively decimate the source to avoid VBV underflow.
x264 is now VBV compliant even with input as evil as /dev/random.

r1880
Initial AVX support
Automatically handle 3-operand instructions and abstraction between SSE and AVX.
Implement one function with this (denoise_dct) as an initial test.
x264 can't make much use of the 256-bit support of AVX (as it's float-only), but 3-operand could give some small benefits.

r1879
Double the base framerate for frame-sequential 3D files
A 60fps frame-sequential 3D file is really only 30 FPS, just alternating between eyes.
Accordingly, ratecontrol should treat it as if it was really 30 FPS.
This will increase the bitrate at the same CRF level for such videos when --frame-packing 5 is used.

r1878
Add --input-fmt option to lavf input
Conforms to ffmpeg's `-f` option.
Use this when lavf fails to guess the input format.

r1877
Two improvements to regression test script
Use SHA-1 hashes for temporary file names to avoid exceeding OS filename length limits.
Correctly return to the original branch after testing if you were on a branch.

r1876
Add some missing values to the non-extended SAR table

r1875
Bump dates to 2011

r1874
More correctly write frame-packing SEI flags

Bug reported by Nero.

r1873
Don't die in x264_encoder_close if an error occurred in x264_encoder_encode
Also clean up properly in x264.c (mostly useful for finding bugs in cleanup).

r1872
Fix reconfiguration of b_tff
Attempting to change field order during encoding could cause slight corruption.

Also fix delta_poc_bottom to be correctly set if interlaced mode is used without B-frames.

r1871
Fix x264 CPU detection with >=64 CPUs on Windows
x264 won't actually use more than one processor group's worth of CPUs, however.
This isn't a problem, as a single x264 instance can't effectively use a full 64 cores anyways.

r1870
Remove high bit depth mmx quant
It was using pmuludq which is sse2, and the function isn't really possible without pmuludq.

r1869
Fix cacheline check in avg2 w20 cache32
Didn't result in incorrect output, only slightly decreased speed on a few obsolete systems.

r1868
instruction in high bit depth ssd_nv12_mmxext

r1867
VFR/framerate-aware ratecontrol, part 2
MB-tree and qcomp complexity estimation now consider the duration of a frame in their calculations.
This is very important for visual optimizations, as frames that last longer are inherently more important quality-wise.
Improves VFR-aware PSNR as much as 1-2db on extreme test cases, ~0.5db on more ordinary VFR clips (e.g. deduped anime episodes).

WARNING: This change redefines x264's internal quality measurement.
x264 will now scale its quality based on the framerate of the video due to the aforementioned frame duration logic.
That is, --crf X will give lower quality per frame for a 60fps video than for a 30fps one.
This will make --crf closer to constant perceptual quality than previously.
The "center" for this change is 25fps: that is, videos lower than 25fps will go up in quality at the same CRF and videos above will go down.
This choice is completely arbitrary.

Note that to take full advantage of this, x264 must encode your video at the correct framerate, with the correct timestamps.

r1866
Improve reference ordering in interleaved 3D video
Provides a decent compression improvement when encoding interleaved 3D content (--frame-packing 5).
Helps more without B-frames and at lower bitrates.
Note that x264 will not do this optimization unless --frame-packing 5 is used to tell x264 that the source is interleaved 3D.

Tests consistently show that interleaved frame packing is by far the best way to compress 3D content.
It gives a ~35-50% compression benefit over separate streams or top/bottom or left/right coding.

Also finally add support for L1 reference reordering (in B-frames).
Also add support for reordered ref0 in L0 and L1 lists; could be useful in the future for other things.

r1865
Cosmetics: fref0/1 -> fref[2] and i_ref0/1 -> i_ref[2]
A much-needed refactoring, plus makes the next patch easier.

r1864
Check an extra offset during weightp analysis
Up to 0.1 - 0.6 dB gain on some fade-ins with --weightp 1, less with --weightp 2.

r1863
SSE2 high bit depth SSIM functions

Patch from Google Code-In.

r1862
SSE2 high bit depth intra_predict_(8x8c|16x16)_p

Patch from Google Code-In.

r1861
MMX high bit depth coeff_last4

Patch from Google Code-In.

r1860
SSE2 high bit depth zigzag_interleave_cavlc

Patch from Google Code-In.

r1859
MMX/SSE2/SSSE3 high bit depth frame_init_lowres functions

Patch from Google Code-In.

r1858
MMX high bit depth 4x4 intra predict functions
DDR and HD directions, as well as making HU faster.
Also enable some SSE2 versions of high bit depth functions that were added but not properly enabled.

Patch from Google Code-In.

r1857
SSE2 high bit depth 8x8 intra predict functions
DDL, DDR, VR, HU, and HD directions, as well as the 8x8 filter.
Also make 8-bit MMX VR faster, by backporting the optimizations from the high bit depth version.

Patch from Google Code-In.

r1856
MMX/SSE2 high bit depth 8x8c intra predict functions

Patch from Google Code-In.

r1855
MMX version of high bit depth plane_copy
And various cosmetics.

Patch from Google Code-In

r1854
Faster x86 predict_8x8c_dc, MMX/SSE2 high bit depth versions

r1853
SSSE3 high bit depth sad_aligned functions

r1852
MMX/SSE2 high bit depth interleave functions

Patch from Google Code-In.

r1851
MMX/SSE2 high bit depth avg functions

Patch from Google Code-In.

r1850
MMX/SSE2 high bit depth deinterleave functions

Patch from Google Code-In

r1849
Shut up some incorrect gcc uninitialized variable warnings

r1848
Write --crop-rect and --frame-packing options to x264 SEI

r1847
Add missing space to parameter SEI

r1846
Fix typo in documentation

r1845
Fix redundant linebreaks in statsfile with weightp

r1844
Use cross_prefix for strings in endian test and as test

r1843
Fix checkasm test for quant in high bit depth
Eliminate some spurious failures.

r1842
Fix broken YV12 handling in the resize filter

r1841
Fix bug with negative lookahead mb costs in high bit depth

r1840
Fix overflow in SSIM calculation in 10-bit

r1839
Fix some possible overflows in VFR ratecontrol with extreme timebases

r1838
Fix memory leak in lavf demuxer.
Leak only occurred with input files that have more than one video stream.

r1837
Fix satd predictors with high bit depth
Resulted in odd CRF-mode results with --no-mbtree, as well as suboptimal VBV handling.

r1836
Fix compile error with high bit depth and disable-asm

r1835
Really fix gcc win32 misalignment crash
gcc's -fno-zero-initialized-in-bss only works if an explicit initializer (e.g. = {0}) is used.

r1834
Support for native Windows threads

Patch originally by Pegasys Inc.

r1833
MMX/SSE2 high bit depth weight_cache/offset(sub|add) functions

Patch from Google Code-In.

r1832
SSE2 high bit depth dequant functions

Patch from Google Code-In.

r1831
SSE2 high bit depth zigzag functions

Patch from Google Code-In.

r1830
MMX/SSE2 versions of high bit depth store_interleave

Patch from Google Code-In.

r1829
Add frame-packing SEI support for signalling 3D video

r1828
Allow 8x8dct+cavlc+lossless with subme>=6

r1827
Add interlaced/no-interlaced case to regression test script

r1826
Save more memory with weightp in >8-bit

r1825
.gitignore more untracked file types

r1824
Work around gcc/ld alignment bug on win32
Fixes problems due to misalignment of static zero arrays (win32 ld can't align .bss properly).

r1823
Fix high bit depth intra pred functions
And re-enable them accordingly.

Patch from Google Code-In.

r1822
Fix weightp analysis with high bit depth

r1821
Fix build error in high depth
Caused by multiple definitions of x264_add8x8_idct_sse2.

r1820
Hotfix for high bit depth
Temporary fix for some unaligned access crashes.

r1819
Delete x264_config.h on distclean

r1818
Tons of high bit depth intra predict asm

Patch from Google Code-In.

r1817
SSE2 high bit depth 8x8/16x16 idct/idct_dc

Patch from Google Code-In.

r1816
Create and install x264_config.h
This header can be used to determine the bit-depth and license of libx264.

r1815
Detect Avisynth initialization failures
Detect if there is a critical Avisynth initialization failure and print the associated error.
This, however, requires a feature present in the latest version of Avisynth alpha (2.6).
Previous versions are unaffected.

r1814
Automatically restrict QPs to avoid quantization (under|over)flow
--cqm jvt and similar should now work "out of the box" instead of requiring futzing with --qpmin.

r1813
Don't try to get timecodes if reading frame failed
This fixes "input timecode file missing data for frame" warning with piped input where we don't know total number of frames.

r1812
Fix possible overflow in sub4x4_dct in 10-bit builds

r1811
Fix bug in intra-refresh + threads
Intra refresh bar quality increase wasn't correctly applied.

r1810
Fix file handle leak in libx264 on error

r1809
Fix incompatible csp format issue
Problem occurred with unknown pixel formats and non mod2 resolutions in the resize filter.

r1808
Really fix fittobox resize rounding code

r1807
Fix regression in rev1549
Skip auto timebase denominator generation when generated timebase denominator exceeds UINT32_MAX.
Also fix double free.

r1806
Fix --tcfile-in if timecode v2 file starts from nonzero pts

r1805
SPARC/Solaris build fixes

r1804
Fix typo in r1797

r1803
Add Python regression test script

Patch from Google Code-In.

r1802
Make --weightp 1 a better speed tradeoff
Since fade analysis is now so fast, weightp 1 now does fade analysis but no reference duplication.
This is the opposite of what it used to do (reference duplication but no fade analysis).
This also gives weightp's better fade quality to faster presets (up to superfast).

r1801
SSE versions of some high-bit-depth DCT functions
Our first Google Code-In patch!

r1800
Clean up weightp analysis function

r1799
Add API function to return max number of delayed frames

r1798
Copy field order flag in encoder_reconfig

r1797
Cosmetics in configure

r1796
Add some more info to `x264 --version`

r1795
Change qpmin default to 0
There's probably no real reason to keep it at 10 anymore, and lowering it allows AQ to pick lower quantizers in really flat areas.
Might help on gradients at high quality levels.
The previous value of 10 was arbitrary anyways.

r1794
Fix ticks_per_frame check for VFR input

r1793
Fix configure so that boolean configuration options are 1/0

There are many cases of 1/undef, not 1/0.

r1792
Only build SPARC VIS asm if high bit-depth is disabled

r1791
Fix build on SPARC Solaris 10

r1790
Fix resize filter rounding code

r1789
Fix regression in chroma weightp
Missing cache calls could cause artifacts, encoder/decoder desync.

r1788
Fix some crashes with high bit depth
Not all arrays were sufficiently aligned.

r1787
Chroma weighted prediction
Like luma weighted prediction, dramatically improves compression in fades.
Up to 4-8db chroma PSNR gain in extreme cases (short, perfect fade-outs).
On actual videos, helps up to ~1% overall.
One example video with a decent number of fades (ef OP): 0.8% bitrate reduction overall, 7% bitrate reduction just counting chroma.
Fixes a lot of artifacts in fades at lower bitrates.

Original patch by Dylan Yudaken <dyudaken@gmail.com>.

r1786
Support custom cropping rectangles
Supposedly useful for 3D television applications.

r1785
Convert X264_HIGH_BIT_DEPTH to HIGH_BIT_DEPTH
Less verbose.

r1784
x86 asm for high-bit-depth pixel metrics
Overall speed change from these 6 asm patches: ~4.4x.
But there's still tons more asm to do -- patches welcome!

Breakdown from this patch:
~13x faster SAD than C.
~11.5x faster SATD than C (only MMX done).
~18.5x faster SA8D than C.
~19.2x faster hadamard_ac than C.
~8.3x faster SSD than C.
~12.4x faster VAR than C.
~3-4.2x faster intra SAD than C.
~7.9x faster intra SATD than C.

r1783
x86 asm for some high-bit-depth coefficient functions
~7.9x faster denoise than C.
~2.3x faster coeff_level_run than C.
~6.6x faster coeff_last than C.
~4.3x faster decimate_score than C.

Also improve checkasm's decimate_score test.

r1782
x86 asm for high-bit-depth motion compensation
~8x faster qpel MC than C.
~10x faster hpel than C.

r1781
x86 asm for high-bit-depth quant
~3.1-4.2x faster than C.

r1780
x86 asm for high-bit-depth DCT
Only MMX and DCT done so far; iDCT still needs asm as well.
~4.4x faster than C.

r1779
x86 asm for high-bit-depth deblocking
~3.3x faster than C.

r1778
Use a 16-bit buffer in hpel_filter regardless of bit depth
This only works up to and including 10-bit (but we don't support anything higher yet).

r1777
Use enums instead of magic numbers in x264_mb_partition_pixel_table

r1776
Improve configure script logging
Now prints the test program that failed in addition to error messages.

r1775
Fix constrained intra pred mode selection

r1774
Various high-bit-depth ratecontrol fixes

r1773
Fix a crash in --dump-yuv for odd resolutions

r1772
Improve flash detection algorithm change in r1765
Now only disables scenecuts only near real end of video, not just prior to forced keyframes.

r1771
Update ffms2 support for its latest API break.

r1770
Modify the x264 header accordingly if --disable-gpl is used

r1769
Save a bit of memory with weightp + high bit depth

r1768
Fix bugs in qpfile parsing with omitted QPs

r1767
Fix HRD with intra-refresh
x264 was incorrectly calculating cpb_removal_delay with respect to the first keyframe.
It should have been calculating cpb_removal_delay with respect to the last keyframe.

r1766
Fix bug in r1753
Overflow compensation fix broke CRF with --no-mbtree.

r1765
Improve flash detection's behavior near the end of the video
Flash detection catches situations like AAAABBCCDDDD, where A,B,C,D are frames in different scenes.
x264 would place a keyframe on the first "D".
However, if the video ended on the last "C", x264 would place a keyframe on the first "C", even though C classifies as a flash.
This change fixes this issue.

r1764
Improve quantizer handling
The default value for i_qpplus1 in x264_picture_t is now X264_QP_AUTO.This is currently 0, but may change in the future.
qpfiles no longer use -1 to indicate "auto"; QP is just omitted.The old method should still work though.

CRF values now make sense in high bit depth mode.
--qp should be used for lossless mode, not --crf.
--crf 0 will still work as expected in 8-bit mode, but won't be lossless with higher bit depths.
Add bit depth to statsfiles.

These changes are required to make the QP interface sensible in combination with high bit depth.

r1763
VFR-aware PSNR/SSIM measurement
First step to VFR-aware MB-tree and bit allocation.

r1762
Disable weightp offset=-1 dupes with high bit depth
They're a hack to compensate for crappy rounding, and thus not worth doing at high bit depth, which fixes most of the rounding issues.

r1761
Make the ffmpeg -vpre error message more descriptive

r1760
Add numeric names for the presets (0==ultrafast ... 9==placebo)
This mapping will of course change if new presets are added in between, but will always be ordered from fastest to slowest.

r1759
Update benchmarks in doc/threads.txt

r1758
Make the #if'd out naive ESA actually match the real implementation

r1757
Move mv/ref prefetch code to the correct location
Prefetching of top blocks should be done under if(top), not if(left).

r1756
Link x264cli explicitly against lavf
Fixes some problems with crappy linkers.

r1755
Fix CBR ratecontrol bug with extremely high qscales
Caused CBR ratecontrol to take a very long time to recover from extreme situations (e.g. /dev/urandom).

r1754
Disable overflow compensation in CRF mode
Wasn't designed with CRF in mind, and acts really weird with CRF+VBV.

r1753
Fix stupid bug in B-frame VBV size prediction

r1752
Fix regression in checkasm in r1666
Buffer is uint16_t* regardless of whether x264 was compiled with high bit depth or not.

r1751
Fix overflows in satd, sa8d and hadamard_ac with high bit depth

r1750
Fix potential problem with overflows in ssd_nv12
The risk of overflows increases exponentially with the bit depth.
The 8-bit asm versions may still overflow with image widths >= 11008 (or 6604 if interlaced).

r1749
Fix syntax for some parameterless functions
Technically, such functions should be declared with (void), not ().

r1748
Fix fps reporting on mingw64
_ftime on mingw64 uses __timeb32 which is broken.
Use ftime instead.

r1747
Fix compilation on PPC with some recent GCCs

r1746
Fix Altivec SATD with small strides
Fixes chroma ME and some of lookahead on PPC.

r1745
Address remaining cacheline split issues in avg2
Slightly improved performance on core 2.
Also fix profiling misattribution of w8/16/20 mmxext cacheline loops.

r1744
Trim a few bytes off some x86 intra pred functions

r1743
Move DTS compression from libx264 to x264cli
DTS compression is an ugly stupid hack and starting to encroach on unrelated areas like VBV.
Some people want it in the mp4 muxer for devices and/or splitters that don't support Edit Boxes.
We just say "throw these broken devices out the window".
DTS compression will remain as a muxer option, --dts-compress, at the user's own risk.
This option is disabled by default.

r1742
Use a larger pic_init_qp with high bit depth
Modify pic_init_qs for consistency.

r1741
Update some of the information in doc/

r1740
Update header in depth.c

r1739
Remove some old unused stuff in the build tree
Regression test (hasn't been updated since svn).
Doxy (was never used).

r1738
Various cosmetics
Exorcise some CamelCase.

r1737
Add missing mod4 stack check to sse2_misalign mc_chroma
Required for ICC compilation.

r1736
Fix 2pass ratecontrol with --nal-hrd cbr

r1735
Fix minor bug in intra pred with intra refresh
i8x8 blocks didn't properly avoid predicting from top-right when necessary.
This could cause intra refresh to not completely refresh the frame.

r1734
Fix filter parsing with --extra-cflags="-DNDEBUG"

r1733
Make sigint handler variable volatile
Didn't actually cause any problems, but is necessary because it can be modified by another thread (the signal call).

r1732
Add High 10 Intra profile support (AVC-Intra)
x264 should now be able to encode compliant AVC-Intra 50.
With a 10-bit-compiled version of x264, a sample commandline for 1080i25 might be:
--interlaced --keyint 1 --vbv-bufsize 2000 --bitrate 50000 --vbv-maxrate 50000 --nal-hrd cbr

Also print "Constrained Baseline" for baseline profile, since that's all x264 (and everything else in the world) supports.
Also reorganize parameter validation a bit to reduce some spurious warnings.

r1731
Finish support for high-depth video throughout x264
Add support for high depth input in libx264.
Add support for 16-bit colorspaces in the filtering system.
Add support for input bit depths in the interval [9,16] with the raw demuxer.
Add a depth filter to dither input to x264.

r1730
Chroma mode decision/subpel for B-frames
Improves compression ~0.4-1%. Helps more on videos with lots of chroma detail.
Enabled at subme 9 (preset slower) and higher.

r1729
Various cosmetics

r1728
Make slice-max-size more aggressive in considering escape bytes
The x264 assumption of randomly distributed escape bytes fails in the case of CABAC + an enormous number of identical macroblocks.
This patch attempts to compensate for this.
It is probably safe to assume in calling applications that x264 practically never violates the slice size limitation.

r1727
Add missing emms for dump-yuv

r1726
Fix CFR ratecontrol with timebase != 1/fps
Fixes VBV + DTS compression, among other things.

r1725
Fix DTS/bitrate calculation if the first PTS wasn't zero
Fix bitrate calculation with DTS compression.

r1724
Fix regression in r1716

r1723
Cosmetics in me.c and frame.c

r1722
Add support for arbitrary user SEIs
This allows calling applications to insert SEIs that x264 doesn't know about while maintaining HRD/VBV accuracy.

r1721
Add full chroma input flag to swscale
Improves quality of colorspace conversions involving RGB(A).

r1720
Add --disable-gpl option to configure
Used for commercially-licensed versions of x264.
Doesn't currently change anything, but may be used to disable GPL-only CLI tools, such as video filters, in the future.
Also print the x264 license and libavformat license in version info.

r1719
Update source file headers
Update dates, improve file descriptions, make things more consistent.
Also add information about commercial licensing.

r1718
Fix intra refresh to not exceed max recovery_frame_cnt
The spec constrains recovery_frame_cnt to [0, MaxFrameNum-1].
So make MaxFrameNum bigger in the case of intra refresh.

r1717
Make intra refresh finish one frame faster
In some cases, the last frame of intra refresh was redundant.
Saves a few bits.

r1716
Fix intra refresh to not predict from invalid pixels
The blocks on the right side of the intra refresh column should not predict from top-right.

r1715
Add configure check for mingw64 prefixing
This compensates for the inconsistent prefixing seen in different versions of the compiler.

r1714
Update some Altivec function prototypes
Silences a lot of warnings.

r1713
Add support for level 1b
This level is a stupid hack in the H.264 spec, so it's a stupid hack in x264 too.
Since level is an integer, calling applications need to set level_idc=9 to use it.
String-based option handling will accept "1b" just fine though, so CLI users don't have to worry.

r1712
Use smaller values for idr_pic_id
Saves a few bits and fixes problems on certain fantastically terrible decoders,
such as the Apple iPad.

r1711
Use POC type 2 for streams with no B-frames
Saves a few bits per slice header.

r1710
Faster cabac_encode_ue_bypass
Use CLZ + a lut instead of a loop.

r1709
Faster nal_escape asm

r1708
Allow --demuxer forcing with known extensions

r1707
Minor fixes/cosmeticcs in commandling parsing

r1706
Fix overflow in stats printing

r1705
Fix bug in 2pass if the first P-frames are all skip
last_qscale_for was read before being initialized in this case, resulting
in the value from the previous iteration being used instead.

r1704
Don't do deblock-aware RD if deblocking is off

r1703
CAVLC "trellis"
~3-10% improved compression with CAVLC.
--trellis is now a valid option with CAVLC.
Perhaps more importantly, this means psy-trellis now works with CAVLC.

This isn't a real trellis; it's actually just a simplified QNS.
But it takes enough shortcuts that it's still roughly as fast as a trellis; just not quite optimal.
Thus the name is a bit of a misnomer, but we're reusing the option name because it does the same thing.
A real trellis would be better, but CAVLC is much harder to trellis than CABAC.
I'm not aware of any published polynomial-time solutions that are significantly close to optimal.

r1702
Add global #define for maximum reference count
This should make it easier to play around with reference frame counts that exceed the spec maximum.

r1701
Simplify addressing logic for interlaced-related arrays
In progressive mode, just make [0] and [1] point to the same place.

r1700
Add missing emms to x264_nal_encode
Only matters for applications using the low-latency callback feature.

r1699
Fix 2 bugs with slice-max-size
Macroblock re-encoding didn't restore mv/tex bit counters (slightly inaccurate 2-pass).
Bitstream buffer check didn't work correctly (insanely large frames could break encoding).

r1698
NV12 version of Altivec chroma MC

r1697
Deblock-aware RD
Small quality gain (~0.5%) at lower bitrates, potentially larger with QPRD.
May help more with psy, maybe not.
Enabled at subme >= 9.Small speed cost (a few %).

r1696
Correct X header path usage in configure
Don't unconditionally set the header path for OpenBSD but do so if the
--enable-visualize flag is specified.

r1695
Fix lavf input with delayed frames

r1694
Slightly improve the filtering section of x264 --help

r1693
Fix debug message typo with DTS compression

r1692
Try to guess input length for lavf input
Allows printing of progress indicator when using lavf input.

r1691
Workaround bug in fps/timestamp handling with lavf input
reordered_opaque in lavf doesn't work correctly in the identity case (no reordering).
Fixes incorrect output for some file types (e.g. raw in mov).

r1690
Fix aspect ratio writing in the MKV muxer
The braindead Matroska spec dictates aspect ratio to be measured in pixels instead of, well, an actual aspect ratio.

r1689
Add libavcore check in configure

r1688
Improve quantizer distribution with sliced-threads+VBV
Should help avoid cases of very uneven quantizer choice between slices.

r1687
Remove dead code in slicetype.c

r1686
Fix incorrect duration/framerate/bitrate in flv header

r1685
invalidate_reference fixes
invalidate_reference didn't actually invalidate the immediate previous frame, only frames that came before that.
Make sure that reordering is forced when invalidate_reference is used, so that the reference list is correct decoder-side.

r1684
Filtering system-related fixes
Fix configure to check for outdated libavutil in resize filter support.
Do not print an explicit error message in ffms when requesting a frame beyond the number of frames in the source.
Mention in --*help that filtering options can be specified as name=value.
Fix the shadowing warning in the resize filter on posix systems.

r1683
Improve reference_invalid support
Reference invalidation can now be used to invalidate multiple frames at a time, rather than being limited to one per encoder_encode call.

r1682
Eradicate all mention of SI/SP-frames

r1681
Fix stack alignment with MB-tree
Broke 2-pass with MB-tree when calling from compilers with broken stack alignment (e.g. MSVC).

r1680
Avisynth 2.6 colorspace support
Use a customized avisynth_c.h to detect the new planar colorspaces.

r1679
Prevent some cases of cache aliasing.
Avoid cases where image strides were a large power of 2.
Core 2: +3% speed at widths 898..960, +6% at widths 1922..1984, most other resolutions unaffected.
Nehalem and AMD: similar amount of speedup, but fewer resolutions affected.

r1678
Fix stack alignment for adaptive quant
Broke calls from compilers with broken stack alignment (e.g. MSVC).

r1677
Fix compilation with shared ffmpeg libs
lavf input uses libavutil functions, so it must request flags for libavutil from pkg-config.

r1676
Fix another PCM bug
CABAC assumes that NNZ is 0 or 1, not the number of actual nonzero coefficients.
Didn't actually break the output; only had a tiny effect on RD.

r1675
Fix regression in r1666
Broke encoding of PCM macroblocks.

r1674
Fix build with bit_depth > 8
Definition of x264_cli_plane_copy was inconsistent with declaration.


r1673
Convert x264 to use NV12 pixel format internally
~1% faster overall on Conroe, mostly due to improved cache locality.
Also allows improved SIMD on some chroma functions (e.g. deblock).
This change also extends the API to allow direct NV12 input, which should be a bit faster than YV12.
This isn't currently used in the x264cli, as swscale does not have fast NV12 conversion routines, but it might be useful for other applications.

Note this patch disables the chroma SIMD code for PPC and ARM until new versions are written.

r1672
Add video filtering system to x264cli
Similar to mplayer's -vf system.
Supports some basic operations like resizing and cropping.Will support more in the future.
See the help for more details.

r1671
Eliminate edge cases for MV predictors
Saves a few clocks in mv pred.

r1670
Improve scenecut detection a bit
Put a minimum value on the scenecut threshold; makes x264 more likely to catch successive scenecuts (but might increase the odds of false detection).
This also fixes scenecut detection with keyint=infinite.
Also print keyint=infinite in the x264 SEI and statsfile correctly.

r1669
Fix 8x8dct+slices+no sliced threads+cavlc+deblock
Deblocking was done slightly incorrectly.
Regression in r1612.

r1668
Fix off-by-one error in slice VBV predictor updates

r1667
Fix disabling of progress with --log-level

r1666
Support for 9 and 10-bit encoding
Output bit depth is specified on compilation time via --bit-depth.
There is currently almost no assembly code available for high-bit-depth modes, so encoding will be very slow.
Input is still 8-bit only; this will change in the future.

Note that very few H.264 decoders support >8 bit depth currently.
Also note that the quantizer scale differs for higher bit depth.For example, for 10-bit, the quantizer (and crf) ranges from 0 to 63 instead of 0 to 51.

r1665
Support infinite keyint (--keyint infinite).
This just means x264 won't insert non-scenecut keyframes.
Useful for streaming when using interactive error recovery or some other mechanism that makes keyframes unnecessary.

Also change POC logic to limit POC/framenum LSB size (to save bits per slice).
Also fix a bug in the CPB underflow detection code (didn't affect the bitstream, just resulted in the failure to print certain warning messages).

r1664
Don't check i16x16 planar mode unless previous modes were useful
Saves ~160 clocks per MB at subme=1, ~270 per MB at subme>1 (measured on Core i7).
Negligle effect on compression.

Also make a few more arrays static.

r1663
Centralize logging within x264cli
x264cli messages will now respect the log level they pertain to.
Slightly reduces binary size.

r1662
Make open-GOP Blu-ray compatible
Blu-ray is even more braindamaged than we thought.
Accordingly, open-gop options are now "normal" and "bluray", as opposed to display and coded.
Normal should be used in all cases besides Blu-ray authoring.

r1661
Callback feature for low-latency per-slice output
Add a callback to allow the calling application to send slices immediately after being encoded.
Also add some extra information to the x264_nal_t structure to help inform such a calling application how the NAL units should be ordered.

Full documentation is in x264.h.

r1660
Simplify pixel_ads

r1659
Interactive encoder control: error resilience
In low-latency streaming with few clients, it is often feasible to modify encoder behavior in some fashion based on feedback from clients.
One possible application of this is error resilience: if a packet is lost, mark the associated frame (and any referenced from it) as lost.
This allows quick recovery from errors with minimal expense bit-wise.

The new i_dpb_size parameter allows a calling application to tell x264 to use a larger DPB size than required by the number of reference frames.
This lets x264 and the client keep a large buffer of old references to fall back to in case of lost frames.
If no recovery is possible even with the available buffer, x264 will force a keyframe.

This initial version does not support B-frames or intra refresh.
Recommended usage is to set keyint to a very large value, so that keyframes do not occur except as necessary for extreme error recovery.

Full documentation is in x264.h.

Move DTS/PTS calculation to before encoding each frame instead of after.
Improve documentation of x264_encoder_intra_refresh.

r1658
Lookaheadless MB-tree support
Uses past motion information instead of future data from the lookahead.
Not as accurate, but better than nothing in zero-latency compression when a lookahead isn't available.
Currently resets on keyframes, so only available if intra-refresh is set, to avoid pops on non-scenecut keyframes.
Not on by default with any preset/tune combination; must be enabled explicitly if --tune zerolatency is used.

Also slightly modify encoding presets: disable rc-lookahead in the fastest presets.
Enable MB-tree in "veryfast", albeit with a very short lookahead.

r1657
Open-GOP support
Allows B-frames immediately prior to keyframes (in display order).
This helps reduce keyframe popping and improve compression with short keyframe intervals.
Due to a staggering display of braindamage in the Blu-ray spec, two open-GOP modes are available.
The two modes calculate keyframe interval differently: one based on coded distance and one based on display distance.
The latter is superior compression-wise, but for no comprehensible reason, Blu-ray requires the former if open-GOP is used.

r1656
Use threadpools to avoid unnecessary thread creation
Tiny performance improvement with fast settings and lots of threads.
May help more on some OSs with slow thread creation, like OS X.
Unify inconsistent synchronized abbreviations to sync.

r1655
Improve 2-pass bitrate prediction
Adapt based on distance to the end in bits, not in frames.
Helps in videos with absurdly simple end sections, e.g. black frames.

r1654
SSE4 and SSSE3 versions of some intra_sad functions
Primarily Nehalem-optimized.

r1653
Improve HRD accuracy
In a staggering display of brain damage, the spec requires all HRD math to be done in infinite precision despite the output being of quite limited precision.
Accordingly, convert buffer management to work in units of timescale.
These accumulating rounding errors probably didn't cause any real problems, but might in theory cause issues in very picky muxers on extremely long-running streams.

r1652
Use -fno-tree-vectorize to avoid miscompilation
Some versions of gcc have been reported to attempt (and fail) to vectorize a loop in plane_expand_border.
This results in a segfault, so to limit the possible effects of gcc's utter incompetence, we're turning off vectorization entirely.
It's not like it ever did anything useful to begin with.

r1651
Fix SIGPIPEs caused by is_regular_file checks
Check to see if input file is a pipe without opening it.

r1650
Fix compilation on ARM w/ Apple ABI

r1649
Faster mbtree_propagate asm
Replace fp division by multiply with the reciprocal.
Only ~12% faster on penryn, but over 80% faster on amd k8.
Also make checkasm slightly more tolerant to rounding error.

r1648
Convert the OPT_ defines in x264.c to an enum

r1647
Don't allow baseline profile streams with fake-interlaced
Indicate use of --fake-interlaced in encoding options SEI.

r1646
Allocate space for null terminator in param_apply_tune

r1645
Fix regression in r1501.
Could cause slightly incorrect analysis in rare cases, but no serious encoding issues.
Also shut up gcc warning about pels_v.

r1644
Fix crash with --subme 0 + --weightp > 0. Regression in r1535

r1643
Replace some divisions with shifts

r1642
Warn about shadowed variable declarations
Also get rid of a few instances of variable shadowing.

r1641
Template load_pic_pointers based on interlaced
Significantly speeds up cache_load in the non-interlaced case.
Also various other minor optimizations in cache_load and cache_save.

r1640
Remove double-dereferences for MB width/height data
Store it in x264_t instead of going through the SPS.

r1639
Exempt Win x86_64 from memalign hack
The API mandates all mallocs are 16 byte aligned.
Remove unused int that stores sizeof malloc in memalign hack.

r1638
Preprocessing cosmetics
Unify input/output defines to HAVE_* format.
Define values as 1 to simplify conditionals.

r1637
Take more shortcuts in i4x4/i8x8 analysis
Based on the scores of the H and V modes, rule out modes which are unlikely.
Small compression loss (0.1-0.5%) and large speed gain (10-30% faster intra analysis).
Not enabled in slower encoding modes.

Also make C versions of the merged SATD functions in order to eliminate branches based on their availability.

r1636
Display SSIM measurement in db as well

r1635
indicate "M" for local commits too
:Sun Jun 6 15:21:12 2010 +0800

Add error message for invalid [de]muxer selection

r1633
Deduplicate the ALIGN macro, move it to common.h

r1632
Fix a use of ALIGNED_ARRAY_16 on ARM

r1631
Add missing emms after nal_encode
Caused random, bizarre failures with some calling applications.

r1630
Fix crash in fake-interlaced at some resolutions

r1629
Fix no-mbtree + aq-mode=0

Regression in r1618.

r1628
Add API function to fix x264_picture_t initialization
Calling applications that do not use x264_picture_alloc need to use x264_picture_init to initialize x264_picture_t structures.
Previously, if the calling application didn't zero x264_picture_t, Bad Things could happen.

r1627
Fix Avisynth input
Regression in r1624.A more permanent solution to the problem will be committed later.

r1626
Convert to a unified "dctcoeff" type for DCT data
Necessary for future high bit-depth support.

r1625
Convert to a unified "pixel" type for pixel data
Necessary for future high bit-depth support.
Various macros and extra types have been introduced to make operations on variable-size pixels more convenient.

r1624
Add API tool to apply arbitrary quantizer offsets
The calling application can now pass a "map" of quantizer offsets to apply to each frame.
An optional callback to free the map can also be included.
This allows all kinds of flexible region-of-interest coding and similar.

r1623
x86 assembly code for NAL escaping
Up to ~10x faster than C depending on CPU.
Helps the most at very high bitrates (e.g. lossless).
Also make the C code faster and simpler.

r1622
Re-enable i8x8 merged SATD
Accidentally got disabled when intra_sad_x3 was added.

r1621
Some deblocking-related optimizations

r1620
Optimize out some x264_scan8 reads

r1619
Add fast skip in lookahead motion search
Helps speed very significantly on motionless blocks.

r1618
Merge some of adaptive quant and weightp
Eliminate redundant work; both of them were calculating variance of the frame.

r1617
Fix omission in libx264 tuning documentation

r1616
Fix ultrafast to actually turn off weightb

r1615
Fix crash with MP4-muxing if zero frames were encoded

r1614
Fix cavlc+deblock+8x8dct (regression in r1612)
Add cavlc+8x8dct munging to new deblock system.
May have caused minor visual artifacts.

r1613
Fix 10L in r1612
Stats need to be calculated before deblock strength, not after.
Broke ref stats in x264cli (no affect on actual output).

r1612
Overhaul deblocking again
Move deblock strength calculation to immediately after encoding to take advantage of the data that's already in cache.
Keep the deblocking itself as per-row.

r1611
Detect Atom CPU, enable appropriate asm functions
I'm not going to actually optimize for this pile of garbage unless someone pays me.
But it can't hurt to at least enable the correct functions based on benchmarks.

Also save some cache on Intel CPUs that don't need the decimate LUT due to having fast bsr/bsf.

r1610
Slightly faster mbtree asm

r1609
Faster deblock strength asm on conroe/penryn

r1608
Avoid an extra var2 in chroma encoding if possible
Also remove a redundant if.

r1607
Avoid a redundant qpel check in lookahead with subme <= 1.

r1606
Fix ABR rate control calculations
Incorrect frame numbers were used, resulting in slightly inaccurate ratecontrol.

r1605
Fix calculation of total bitrate printed after stop by CTRL+C

r1604
Fix typo in fake-interlaced documentation

r1603
Fix CABAC+PCM, regression in r1592
Changes to queue in CABAC didn't get propagated to PCM code.

r1602
Fix performance regression in r1582
Set the correct compiler flags.

r1601
Rewrite deblock strength calculation, add asm
Rewrite is significantly slower, but is necessary to make asm possible.
Similar concept to ffmpeg's deblock strength asm.
Roughly one order of magnitude faster than C.
Overall, with the asm, saves ~100-300 clocks in deblocking per MB.

r1600
Fix different output with differing sync-lookahead
Also reduce memory consumption.

r1599
Mark Win32 executable as large address aware

r1598
Add "Fake interlaced" option
This encodes all frames progressively yet flags the stream as interlaced.
This makes it possible to encode valid 25p and 30p Blu-Ray streams.
Also put the pulldown help section in a more appropriate place.

r1597
Modify version.sh to output to stdout.
Update configure to match.

r1596
Set correct filesystem permissions for various files

r1595
Fix regression in r1566
Intra stats need to be kept track of for fast intra decision.

r1594
Fix rc-lookahead in encoding options SEI in 2-pass with VBV

r1593
Reduce memory usage in 2-pass with b-adapt 2

r1592
Overhaul CABAC: faster, less cache usage
Horribly munge up the CABAC tables to allow deduplication of some data.
Saves 256 bytes of L1d cache in non-RD, 512 bytes in RD.
Add asm versions of bypass and terminal; save L1i cache by re-using putbyte code.
Further optimize encode_decision.
All 3 primary CABAC functions fit in under 256 bytes of code total on x86_64.

r1591
Fix typo in pulldown

r1590
Fix bitrate calculation in progress status
Was slightly incorrect due to using pts, which is out of order.

r1589
Fix crash with sliced-threads on Phenom

r1588
Fix condition for printing rc=cbr in options SEI
Also fix crf-max formatting.

r1587
Shrink even more constant arrays

r1586
Add API function to trigger intra refresh
Useful for interactive applications where the encoder knows that packet loss has occurred on the client.
Full documentation is in x264.h.

r1585
Fix intra refresh behavior with I-frames
Intra refresh still allows I-frames (for scenecuts/etc).
Now I-frames count as a full refresh, as opposed to instantly triggering a refresh.

r1584
More cosmetics

r1583
Fix unresolved symbol in r1573
gnu ld didn't complain, but some other linkers did.

r1582
Remove unnecessary --enable options
Change --enable-visualize to actually check for X11 support.

r1581
Don't force row QPs to integer values with VBV
VBV should no longer raise the bitrate of the video.That is, at a given quality level or average bitrate, turning on VBV should only lower the bitrate.
This isn't quite true if adaptive quant is off, but nobody should be doing that anyways.
Also may result in slightly more accurate per-row VBV ratecontrol.

r1580
Add field-order detection to y4m demuxer

r1579
Fix sliced-threads + interlaced
Broken in r1546.

r1578
Improve temporal MV prediction
Predict based on the results of p16x16 search, not final MVs.
This lets us get predictions even if mode decision chose intra.
Also improves cache coherency.

r1577
More accurate MV prediction on edges in lookahead

r1576
Error out on invalid input stride
Might catch some crashes due to buggy calling applications.

r1575
Remove unnecessary debugging assert
Shouldn't have been in r1568 to begin with.

r1574
Shrink some more constant arraysr

1573
Deduplicate asm constants, automate name prefixing
Auto-prefix global constants with x264_ in cextern.
Eliminate x264_ prefix from asm files; automate it in cglobal.
Deduplicate asm constants wherever possible to save data cache (move them to a new const-a.asm).
Remove x264_emms() entirely on non-x86 (don't even call an empty function).
Add cextern_naked for a non-prefixed cextern (used in checkasm).

r1572
Shrink a few x86 asm functions
Add a few more instructions to cut down on the use of the 4-byte addressing mode.

r1571
Make options SEI use weight* instead of wpred*
More intuitive and maps more reasonably to the CLI options.
Breaks statsfile backwards-compatibility.

r1570
r1548 broke subme < 3 + p8x8/b8x8
Caused significantly worse compression.Preset-wise, only affected veryfast.
Fixed by not modifying mvc in-place.

r1569
More write-combining

r1568
Reduce lookahead memory usage, cache misses
Merge lowres_types with lowres_costs.

r1567
Fix build on x86 with asm on but SSE off

r1566
Don't calculate ref/partition stats if not necessary

r1565
Split out MV prediction into mvpred.c
Make common/macroblock.c a bit less gigantic.

r1564
Fix mv predictor clipping on non-x86 (regression in r1548)


r1563
Move getopt.c to x264cli sources from libx264
Only affects builds on systems without getopt.c.

r1562
Move deblocking code to a separate file
Should clean up frame.c a bit.

r1561
fix ffms demuxer to support input timebase values > 2^31

r1560
Fix 10l in cache_load changes
Broke constrained intra pred, probably not anything else.

r1559
Faster fullpel predictor checking
Also shave a few instructions off dia/hex motion estimation loops.

r1558
Fix checkasm's generation of deblock inputs (regression in r1517)

r1557
Fix printing of bitrate when timestamps aren't available
Doesn't affect x264cli, but was broken in some other apps in CFR mode.

r1556
Don't check mv0 twice
One less SAD in motion estimation.
Also rename bmv -> pmv; more accurate naming.

r1555
Remove reordering restrictions from weightp
Apparently the spec does allow two consecutive copies of the same frame in the reference list.
This involves an incredibly ugly hack to wrap around the frame number.
Very slight compression improvement.

r1554
Print intra chroma pred modes in stats

r1553
Add mv0 special case in pskip chroma MC
Significantly faster pskip MC.

r1552
Fix build scripts to work with non-GNU tools

r1551
Faster deblock reference frame checks
Use a lookup table to simplify logic

r1550
Faster chroma CBP handling

r1549
Fix issues with extremely large timebases
With timebase denominators >= 2^30 , x264 would silently overflow and cause odd issues.
Now x264 will explicitly fail with timebase denominators >= 2^31 and work with timebase denominators 2^31 > x >= 2^30.

r1548
MMX code for predictor rounding/clipping
Faster predictor checking at subme < 3.

r1547
Fix four minor bugs found by Clang

r1546
Move deblocking/hpel into sliced threads
Instead of doing both as a separate pass, do them during the main encode.
This requires disabling deblocking between slices (disable_deblock_idc == 2).
Overall performance gain is about 11% on --preset superfast with sliced threads.
Doesn't reduce the amount of actual computation done: only better parallelizes it.

r1545
Prefetch MB data in cache_load
Dramatically reduces L1 cache misses.
~10% faster cache_load.

r1544
Fix a ton of pessimization caused by aliasing in cache_save and cache_load

r1543
Add CP128/M128 macros using SSE

r1542
Fix various early terminations with slices
Neighbouring type values (type_top, etc) are now loaded even if the MB isn't available for prediction.
Significant overall performance increase (as high as 5-10%+) with lots of slices (e.g. with slice-max-size).

r1541
Enable --fast-pskip on fast firstpass

r1540
Make interlaced detection in avisynth only apply to field-based input
Fixes improper flagging of progressive sources.

r1539
Set psy=0 in lossless mode
Doesn't actually affect output, just what's written in the SEI.

r1538
Fix a use of sad_x4 that had non-mod64 stride
Minimal speed improvement, but fixes a violation of internal api.

r1537
Make keyint_min auto by default
Gives more reasonable default settings when using short GOPs.

r1536
Faster mv predictor checking at subme < 3
Simplify the predicted MV cost check.

r1535
Special case in qpel refine for subme=1
~15-20% faster qpel refine with subme=1.
Some minor cleanups in refine_supel.

r1534
Cosmetics: VLC tables

r1533
Add faster mv0 special case for macroblock-tree
Improves performance on low-motion video.

r1532
Add miscompilation check for x264_clz
Running a Phenom-optimized build of x264 (e.g. -march=amdfam10) on a non-Phenom CPU didn't SIGILL; instead it would silently produce incorrect output.
Now, instead, it will error out loudly.

r1531
Fixing floating-point exception in level-checking
Doesn't cause any issues for x264cli, but might impact some calling apps that care (e.g. Delphi apps).

r1530
Save a few bits in multislice encoding
Set the initial QP for each slice to the last QP of the previous slice.

r1529
Early termination in 16x8/8x16 search
Combine the actual cost of the first partition with the predicted cost of the second to avoid searching the second when possible.
Reduces the number of times the second partition is searched by up to ~75% in non-RD mode, ~10% in RD mode.
Negligible effect on compression.

r1528
Make MV prediction work across slice boundaries
Should improve motion search with lots of small slices, e.g. with slice-max-size.
Still restricted by sliced threads (won't cross the boundary between two threadslices).
The output-changing part of the previous patch.

r1527
Cleanup and simplification of macroblock_load
Doesn't do anything now, but will be useful for many future changes.
Splitting out neighbour calculation will make MBAFF implementation easier.
Calculation of neighbour_frame value (actual neighbouring MBs, ignoring slices) will be useful for some future patches.

r1526
Add missing #include to display-x11.c

r1525
Add TFF/BFF detection to all demuxers
Fix interlaced Avisynth input, automatically weave field-based input.

r1524
Correctly mark output frames as BREF
Simplify pic_out code.

r1523
Fix HRD compliance
As usual, the spec is so insanely obfuscated that it's impossible to get things right the first time.

r1522
Better b16x8/8x16 early termination in B-frames
A bit slower but up to 1-2% better compression.

r1521
Fix 10L in B-skip improvement patch

r1520
Fix printing of SEI header with VBV + ABR
SEI header shouldn't say CBR unless bitrate == maxrate.

r1519
Simplify slicetype_frame_cost
Avoid redundant calculations when VBV is on (due to the intra-only call).
Move most of the logic into per-MB code.

r1518
Faster CABAC state copying for small partitions
Save ~25 clocks per i4x4, i8x8, and sub8x8 RD call.

r1517
Massive cosmetic and syntax cleanup
Convert all applicable loops to use C99 loop index syntax.
Clean up most inconsistent syntax in ratecontrol.c, visualize, ppc, etc.
Replace log(x)/log(2) constructs with log2, and similar with log10.
Fix all -Wshadow violations.
Fix visualize support.

r1516
Fix array overread in b8x16 search

r1515
Faster direct check with subpartitions off
Also simplify the whole function a bit.

r1514
Print crf-max with appropriate precision in SEI

r1513
Fix 10l in timecode seeking

r1512
Fix 10L: Remove needless error check
This error check was for cfr input + --timebase, but that doesn't happen, and brings about a bug with vfr input.

r1511
Don't use 2 L1 refs with pyramid + ref=1
Slightly faster encoding with ref=1.

r1510
Update copyright year in SEI header

r1509
New "superfast" preset, much faster intra analysis

Especially at the fastest settings, intra analysis was taking up the majority of MB analysis time.
This patch takes a ton more shortcuts at the fastest encoding settings, decreasing compression 0.5-5% but improving speed greatly.
Also rearrange the fastest presets a bit: now we have ultrafast, superfast, veryfast, faster.
superfast is the old veryfast (but much faster due to this patch).
veryfast is between the old veryfast and faster.
faster is the same as before except with MB-tree on.

Encoding with subme >= 5 should be unaffected by this patch.

r1508
Avoid redundant MV prediction in duplicate refs

r1507
Cosmetics in mvd handling
Use a 2D array instead of doing manual pointer arithmetic.

r1506
Fix make uninstall on systems with executable suffixes

r1505
Add tune for still image compression
There has been some demand for this from companies looking to use x264 for still image compression (it can outperform JPEG or JPEG-2000 by a factor of 2 or more).
Still image compression is a bit different; because temporal stability isn't an issue, we can get away with far more powerful psy settings.

r1504
Pad non-mod16 resolutions using the correct field

Improves compression of interlaced videos with non-mod16 heights.

r1503
Document slow/fast firstpass in --fullhelp

r1502
Fix some misattributions in profiling
Cycles spent in load_hadamard and the avg2 w16 ssse3 cacheline split code were misattributed.

r1501
Much faster non-RD intra analysis
Since every pred mode costs at least 1 bit, move that part into the initial SATD cost.
This lets i4x4/i8x8 analysis terminate earlier.
If the cost of the predicted mode is less than the cost of signalling any other mode, early-terminate the analysis.

r1500
Fix stack alignment in sliced threads
Could cause crashes when called from non-GCC-compiled applications.

r1499
Cosmetics: use sizeof() where appropriate

r1498
Split up analyse_init
Save some time by avoiding some unnecessary inits and moving other parts to per-thread init.

r1497
Reduce stack usage of b-adapt 2's trellis
Also remove some redundant code.

r1496
Various motion estimation optimizations
Faster method of checking MV range.
Predict MVs and cache MVs/MVDs for bidir qpel-RD.
A whole bunch of other minor optimizations.
Slightly better performance and compression.

r1495
Overhaul macroblock_cache_rect
Unify the rectangle functions into a single one similar to ffmpeg's fill_rectangle.
Remove all cases of variable-size cache_rect calls; create a function-pointer-based system for handling such cases.
Should greatly decrease code size required for such calls.

r1494
Make a bunch of small functions ALWAYS_INLINE
Probably no real effect for now, but needed for the next patch.

r1493
Two compatibility fixes
Add IA64 support in configure.

r1492
Faster x264_macroblock_encode_pskip
GCC is apparently unable to optimize out the calculation of a variable when it isn't used.

r1491
Much more accurate B-skip detection at 2 < subme < 7
Use the same method that x264 uses for P-skip detection.
This significantly improves quality (1-6%), but at a significant speed cost as well (5-20%).
It also may have a very positive visual effect in cases where the inaccurate skip detection resulted in slightly-off vectors in B-frames.
This could cause slight blurring or non-smooth motion in low-complexity frames at high quantizers.
Not all instances of this problem are solved: the only universal solution is non-locally-optimal mode decision, which x264 does not currently have.

subme >= 7 or <= 2 are unaffected.

r1490
Reformat profile restrictions in --fullhelp.

Put "no interlaced", "no lossless" on their own line to avoid them
running into the default options list.

r1489
Fix typo in configure

r1488
Add support for spaces to iPhone GAS preprocessor script

r1487
Fix slightly wrong mp4 duration.

r1486
Fix link errors with newest gpac cvs
gpac decided to randomly break API and require us to use their own custom malloc and free.

r1485
Save a few bits in slice headers
Don't override the maximum ref index in the slice header if it's the same as the default.
Also update the naming of the relevant variables in the PPS.

r1484
Shrink some arrays in x264_t
Also remove an unnecessary assignment from cache_load.

r1483
Use x264_log in more places instead of fprintf

r1482
Fix two nondeterminisms
Move noise reduction data into thread-specific data.
Use correct reference list for L1 temporal predictors.

r1481
"CRF-max" support with VBV
This is a rather curious feature that may have more use than is initially obvious.
In CRF mode with VBV enabled, CRF-max allows the user to specify a quality level which the encoder will never go below, even due to the effects of VBV.
This is not the same as qpmax, which is not aware of issues like scene complexity.
Setting this WILL cause VBV underflows in any situation where the encoder would have needed to exceed the relevant CRF to avoid underflow.

Why might one want to do this even if it would cause VBV underflows?
In the case of streaming, particularly ultra-low-latency streaming, it may be preferable to drop frames than to display frames that are of too low a quality.
Thus, in extremely complex scenes, rather than display completely awful video, the streaming server could simply drop to a lower framerate.
Scenecuts, which normally look terrible under situations like single-frame VBV, could be handled by just displaying them a bit later and dropping frames to compensate.
In other words, it's better to see the scenecut 150ms delayed than for it to look like a blocky mess for 150ms.

On the caller-side, this would be handled by detecting the output size of x264's frames and dropping future frames to compensate if necessary.

This can also be used in normal encoding simply to ensure that VBV does not hurt quality too much (at the cost of potentially causing underflows).
This can help quite a lot when using single-frame VBV and sliced threads, where VBV can often be somewhat unstable.

r1480
Blu-ray support: NAL-HRD, VFR ratecontrol, filler, pulldown
x264 can now generate Blu-ray-compliant streams for authoring Blu-ray Discs!
Compliance tested using Sony BD-ROM Verifier 1.21.
Thanks to The Criterion Collection for sponsoring compliance testing!

An example command, using constant quality mode, for 1080p24 content:
x264 --crf 16 --preset veryslow --tune film --weightp 0 --bframes 3 --nal-hrd vbr --vbv-maxrate 40000 --vbv-bufsize 30000 --level 4.1 --keyint 24 --b-pyramid strict --slices 4 --aud --colorprim "bt709" --transfer "bt709" --colormatrix "bt709" --sar 1:1 <input> -o <output>

This command is much more complicated than usual due to the very complicated restrictions the Blu-ray spec has.
Most options after "tune" are required by the spec.
--weightp 0 is not, but there are known bugged Blu-ray player chipsets (Mediatek, notably) that will decode video with --weightp 1 or 2 incorrectly.
Furthermore, note the Blu-ray spec has very strict limitations on allowed resolution/fps combinations.
Examples include 1080p @ 24000/1001fps (NTSC FILM) and 720p @ 60000/1001fps.

Detailed features introduced in this patch:

Full NAL-HRD compliance, with both VBR (no filler) and CBR (filler) modes.
Can be enabled with --nal-hrd vbr/cbr.
libx264 now returns HRD timing information to the caller in the form of an x264_hrd_t.
x264cli doesn't currently use it, but this information is critical for compliant TS muxing.

Full VFR ratecontrol support: VBV, 1-pass ABR, and 2-pass modes.
This means that, even without knowing the average framerate, x264 can achieve a correct bitrate in target bitrate modes.
Note that this changes the statsfile format; first pass encodes make before this patch will have to be re-run.

Pulldown support: libx264 allows the calling application to specify a pulldown mode for each frame.
This is similar to the way that RFFs (Repeat Field Flags) work in MPEG-2.
Note that libx264 does not modify timestamps: it assumes the calling application has set timestamps correctly for pulldown!
x264cli contains an example implementation of caller-side pulldown code.

Pic_struct support: necessary for pulldown and allows interlaced signalling.
Also signal TFF vs BFF with delta_poc_bottom: should significantly improve interlaced compression.
--tff and --bff should be preferred to the old --interlaced in order to tell x264 what field order to use.

Huge thanks to Alex Giladi and Lamont Alston for their work on code that eventually became part of this patch.

r1479
Timecode input/output
--tcfile-in allows a user to specify a timecode v1 or v2 file to override input timestamps.
Useful for dealing with VFR input, especially when FFMS/LAVF support isn't available.
--tcfile-out writes a timecode v2 file containing the timecodes of the output file.
New --timebase option allows a user to change the stream timebase.
Intended primarily for forcing timebase with timecode files if necessary.
When using --seek, note that x264 will seek in the timecode file as well.

r1478
Mixed-refs support for B-frames
Small speed cost, usually a few percent at most. Generally has lowest cost in cases when it isn't very useful. Up to ~2% better compression overall on highly complex sources.

Also fix a few minor bugs in B-frame analysis and various bits of cleanup.

r1477
Faster rounding of chroma DC coefficients

r1476
Faster cabac_encode_decision_asm
Minimizes instruction count, which also means smaller code.
Various other slight changes to allow more instruction level parallelism.

r1475
Faster hpel_filter
On ssse3, use pmaddubsw for h filter too (similar to v filter).
Change 32-bit v and c filters to write the result non-temporal.
Add commented-out defines to disable non-temporal operation.
Hardly any black magic here, but still a measurable win especially for ssse3.

r1474
Ignore XYSCSS in y4m if the newer standard C tag is present

Apparently y4mscaler will generate 4:2:0 files with XYSCSS set to 444

r1473
Fix regression in r1450
I_PCM blocks would cause x264 to crash or generate bad output. Simplify PCM handling.

r1472
Fix crash with intra-refresh + aq-mode 0

r1471
Fix regression in r1453
r1453 broke psy-trellis with --trellis 2

r1470
Fix regression in r1449
Incorrectly placed thread MV check could result in rare thread MV internal errors, esp. with --non-deterministic.
These weren't fatal errors (x264 could recover and continue with slight compression loss).

r1469
Cut size of MVD arrays by a factor of 2 again
Only store the MVDs of the edges of each MB.

Thanks to Michael Niedermayer for the idea.

r1468
Disable Altivec and VIS optimizations when --disable-asm is specified

r1467
Fix a buffer overread on odd input resolutions

r1466
Fix one bug, one corner case in VBV
qp_novbv wasn't set correctly for B-frames.
Disable ABR code for frames with zero complexity.
Disable ABR code for CBR mode; it is completely unnecessary and can have negative consequences.

r1465
Port Mans Rullgard's NEON intra prediction functions from ffmpeg

r1464
Remove unused function
Two other minor fixes.

r1463
Use short startcode in more possible situations
Previous patch didn't cover all possible uses according to B.1.2.

r1462
Fix fastfirstpass
Apparently the libx264 preset changes made "fastfirstpass" into "fastsecondpass" inadvertantly.

r1461
Fix various silly errors in the previous patches

r1460
Actually error out if preset/tune/profile is invalid
Got lost somewhere in the move to libx264-based presets.

r1459
Faster probe_skip, 2x2 DC transform handling
Move the 2x2 DC DCT into the dct_dc asm function to avoid some store-to-load forwarding penalties and extra register loads.
Use dct_dc as part of the early termination in probe_skip.
x86 asm partially by Holger Lubitz.
ARM NEON asm by David Conrad.

r1458
Use short startcodes whenever possible
Saves one byte per frame for every slice beyond the first.
Only applies to Annex-B output mode.

r1457
New algorithm for AQ mode 2
Combines the auto-ness of AQ2 with a new var^0.25 instead of log(var) formula.
Works better with MB-tree than the old AQ mode 2 and should give higher SSIM.

r1456
Abide by the MinCR level limit
Some Blu-ray analyzers were complaining about this.

r1455
Make b-pyramid normal the default
Now that b-pyramid works with MB-tree and is spec compliant, there's no real reason not to make it default.
Improves compression 0-5% depending on the video.
Also allow 0/1/2 to be used as aliases for none/strict/normal (for conciseness).

r1454
Move presets, tunings, and profiles into libx264
Now any application calling libx264 can use them.
Full documentation and guidelines for usage are included in x264.h.

r1453
Faster, more accurate psy-RD caching
Keep more variants of cached Hadamard scores and only calculate them when necessary.
Results in more calculation, but simpler lookups.
Slightly more accurate due to internal rounding in SATD and SA8D functions.

r1452
Much faster and more efficient MVD handling
Store MV deltas as clipped absolute values.
This means CABAC no longer has to calculate absolute values in MV context selection.
This also lets us cut the memory spent on MVDs by a factor of 2, speeding up cache_mvd and reducing memory usage by 32*threads*(num macroblocks) bytes.
On a Core i7 encoding 1080p, this is about 3 megabytes saved.

r1451
Add temporal predictor support to interlaced encoding
0.5-1% better compression in interlaced mode

r1450
Keep track of macroblock partitions
Allows vastly simpler motion compensation and direct MV calculation.

r1449
Much faster and simpler direct spatial calculation

r1448
SimpleBlock requires Matroska Doctype v2

r1447
Add GPAC version check

r1446
Fix stupid regression in interlaced in r1430
With ref > 8 or b-pyramid, an array over-read could cause slightly incorrect B-frames.

r1445
Fix overread of scratch buffer
Could cause crashes on non-mod16 frames.

r1444
Fix integer overflow in chroma SSD check
Could cause bad skips at very high quantizers on extreme inputs.

r1443
Fix I and B-frame QPs with threads
Rounding errors resulted in slightly wrong QPs with threads enabled.

r1442
Fix compilation on ARM

r1441
Remove unnecessary PIC support macros
yasm has a directive to enable PIC globally

r1440
Don't even try direct temporal when it would give junk MVs
In PbBbP pyramid structure, the last "b" cannot use temporal because L0Ref0(L1Ref0) != L0Ref0.
Don't even bother analyzing it, just use spatial.
Should improve speed and direct auto effectiveness in CRF and 1-pass modes when b-pyramid is used.
Also makes --direct temporal useful with --b-pyramid, since it will fall back to spatial for frames where temporal is broken.

r1439
iPhone compilation support
Also add --sysroot to configure options

To build for iPhone 3gs / iPod touch 3g:
CC=/Developer/Platforms/iPhoneOS.platform/Developer/usr/bin/gcc ./configure --host=arm-apple-darwin --sysroot=/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS3.0.sdk

For older devices, add
--extra-cflags='-arch armv6 -mcpu=arm1176jzf-s' --extra-ldflags='-arch armv6' --disable-asm

r1438
ARM NEON versions of weightp functions

r1437
Use #ifdef instead of #if in checkasm

r1436
Make the ABR buffer consider the distance to the end of the video
Should improve bitrate accuracy in 2-pass mode.
May also slightly improve quality by allowing more variation earlier-on in a file.

Also fix abr_buffer with 1-pass: it does something very different than what it does for 2-pass.
Thus, the earlier change that increased it based on threads caused 1-pass ABR to be somewhat less accurate.

r1435
Mark cli_input/output_t variables as const when possible

r1434
mkv: Write the x264 version into the file header

This only updates the "writing application"; matroska_ebml.c is the
"muxing application", but the version string for that is still hardcoded.

r1433
mkv: Write SimpleBlock instead of Block for frame headers

mkvtoolnix writes these by default since 2009/04/13.
Slightly simplifies muxer and allows 'mkvinfo -s' to show B-frames
as 'B' (but not B-ref frames).

r1432
Allow | as a separator between psy-rd and psy-trellis values.
[,:/] are all taken when setting psy-trellis in a zone in an mencoder option.

Also fix a comment typo and remove a useless line of code.

r1431
Backport various speed tweak ideas from ffmpeg
Add mv0 early termination to spatial direct calculation
Up to twice as fast direct mv calculation on near-motionless video.

Branchless CAVLC level code adjustment based on trailing ones.
A few clocks faster.

Check tc value before clipping in C version of deblock functions.
Much faster, but nobody uses those anyways.

Thanks to Michael Niedermayer for the ideas.

r1430
Implement direct temporal + interlaced
This was much easier than I expected.
It will also be basically useless until TFF/BFF support gets in, since it requires delta_poc_bottom to be set correctly to work well.

r1429
Allow longer keyints with intra refresh
If a long keyint is specified (longer than macroblock width-1), the refresh will simply not occur all the time.
In other words, a refresh will take place, and then x264 will wait until keyint is over to start another refresh.

r1428
Overhaul sliced-threads VBV
Make predictors thread-local and allow each thread to poll the others to get their predicted sizes.
Many, many other tweaks to improve quality with small VBV and sliced threads.
Note this may somewhat increase the risk of a VBV underflow in such extreme situations (single-frame VBV).
This is tolerable, as most relevant use-cases are better off with a few rare underflows (even if they have to drop a slice) than consistent low quality.

r1427
Print psy-(rd|trellis) with more precision in userdata SEI

r1426
More formatting fixes in x264 help

r1425
Faster 2x2 chroma DC dequant

r1424
Write PASP atom in mp4 muxing
Adds container-level aspect ratio support for mp4.

r1423
Fix 2-pass ratecontrol continuation in case of missing statsfile
Didn't work properly if MB-tree was enabled.

r1422
Smarter QPRD
Catch some cases in which RD checks can be avoided; reduces QPRD RD calls by 10-20%.

r1421
Fix subpel iteration counts with B-frame analysis and subme 6/8
Since subme 6 means "like subme 5, except RD on P-frames", B-frame analysis
shouldn't use the RD subpel counts at subme 6.Similarly with subme 8.
Slightly faster (and very marginally worse) compression at subme 6 and 8.

r1420
Simplify decimate checks in macroblock_encode
Also fix a misleading comment.

r1419
Improve bidir search, fix some artifacts in fades
Modify analysis to allow bidir to use different motion vectors than L0/L1.
Always try the <0,0,0,0> motion vector for bidir.
Eliminates almost all errant motion vectors in fades.
Slightly improves PSNR as well (~0.015db).

r1418
Slightly faster predictor_difference_mmxext

r1417
Add ability to adjust ratecontrol parameters on the fly
encoder_reconfig and x264_picture_t->param can now be used to change ratecontrol parameters.
This is extraordinarily useful in certain streaming situations where the encoder needs to adapt the bitrate to network circumstances.

What can be changed:
1) CRF can be adjusted if in CRF mode.
2) VBV maxrate and bufsize can be adjusted if in VBV mode.
3) Bitrate can be adjusted if in CBR mode.
However, x264 cannot switch between modes and cannot change bitrate in ABR mode.

Also fix a bug where x264_picture_t->param reconfig method would not always be frame-exact.

Commit sponsored by SayMama video calling.

r1416
Fix regression in r1406
Bitrate was printed incorrectly for some input framerates.

r1415
Fix log2f detection, include order, some gcc warnings
r1413 caused crashes on any system with malloc.h.
Also switch to std=c99 or std=gnu99 if supported by the compiler.
Fix visualize support.

r1414
Fix abstraction violations in x264.c
No calling application--not even x264cli--should ever look inside x264_t.

r1413
Move -D CFLAGS to config.h

r1412
Fix stat with large file support

r1411
Implement ffms2 version check
Depends on ffms2 version 2.13.1 (r272).
Tries pkg-config's built-in version checking first.
Uses only the preprocessor to avoid cross-compilation issues.

r1410
Fix implicit CBR message to only print when in ABR mode
Also make it print outside of debug mode.

r1409
Add configure check for log2 support
Some incredibly braindamaged operating systems, such as FreeBSD, blatantly ignore the C specification and omit certain functions that are required by ISO C.
log2f is one of these functions that periodically goes missing in such operating systems.

r1408
Add config.log support
Now, if configure fails, you'll be able to see why.

r1407
Fix cross-compiling with lavf, add support for ffms2.pc
Also update configure script to work with newest ffms.

r1406
Improve DTS generation, move DTS compression into libx264
This change fixes some cases in which PTS could be less than DTS.
Additionally, a new parameter, b_dts_compress, enables DTS compression.
DTS compression eliminates negative DTS (i.e. initial delay) due to B-frames.
The algorithm changes timebase in order to avoid duplicating DTS.
Currently, in x264cli, only the FLV muxer uses it.The MP4 muxer doesn't need it, as it uses an EditBox instead.

r1405
Various threading-related cosmetics
Simplify a lot of code and remove some unnecessary variables.

r1404
Hardcode the bs_t in cavlc.c; passing it around is a waste
Saves ~1.5kb of code size, very slight speed boost.

r1403
Fix lavf input with pipes and image sequences
x264 should now be able to encode from an image sequence using an image2-style formatted string (e.g. file%02d.jpg).

r1402
Fix bitstream alignment with multiple slices
Broke multi-slice encoding on CPUs without unaligned access.
New system simply forces a bitstream realignment at the start of each writing function and flushes when it reaches the end.

r1401
Merge nnz_backup with scratch buffer
Slightly less memory usage.

r1400
Use cross-prefix properly with pkg-config for cross-compiling

r1399
Various performance optimizations
Simplify and compact storage of direct motion vectors, faster --direct auto.
Shrink various arrays to save a bit of cache.
Simplify and reorganize B macroblock type writing in CABAC.
Add some missing ALIGNED macros.

r1398
Fix crash on new AMD M300 and similar CPUs
Apparently these CPUs have SSE4a, but not misaligned SSE.

r1397
Fix intra refresh with subme < 6
Also improve the quality of intra masking.

r1396
Add support for multiple --tune options
Tunes apply in the order they are listed in the case of conflicts.
Psy tunings, i.e. film/animation/grain/psnr/ssim, cannot be combined.
Also clarify --profile, which forces the limits of a profile, not the profile itself.

r1395
Various bugfixes and tweaks in analysis
Fix the oldest-ever bug in x264: b16x8 analysis used the wrong width for predict_mv.
Fix cache_ref calls for slightly better MV prediction in bsub16x16 analysis.
Make B-partition analysis consider reference frame costs.
Various other minor changes.
Overall very slightly improved mode decision and motion search in B-frames.

r1394
More --me tesa optimizations

r1393
Fix typo in configure

r1392
Make --fps force CFR mode

r1391
Eliminate intentional array overflow in quant matrix handling
While it probably never caused problems, it was incredibly ugly and evil.

r1390
Faster --me tesa

r1389
Fix static pthreads + dynamically linked x264 on win32
Add the necessary static pthread initialization code to a new DLLmain function.

r1388
Add getopt_long to the included getopt.c
Fixes option handling on OSs that have a nonworking/missing getopt (e.g. Solaris).

r1387
Faster psy-trellis init
Remove some unncessary zigzags.

r1386
Simplfy intra mode availability handling
Slightly faster, 1.5kb smaller binary size, less code.

r1385
Fix free callback, add x264_encoder_parameters function
x264 would try to use the passed param struct after freeing if the param_free callback was set.
Probably didn't cause any issues, as probably no programs used the callback in this location yet.

A new x264_encoder_parameters function is now available in the API.
This function lets the calling application grab the current state of the encoder's parameters.
Use this in x264cli to ensure that the param struct used for set_param is updated with whatever changes x264_encoder_open has made to it.

Patch partially by Anton Mitrofanov <BugMaster@narod.ru>.

r1384
Fix x264 compilation on Apple GCC
Apple's GCC stupidly ignores the ARM ABI and doesn't give any stack alignment beyond 4.

r1383
Faster weightp motion search
For blind-weight dupes, copy the motion vector from the main search and qpel-refine instead of doing a full search.
Fix the p8x8 early termination, which had unexpected results when combined with blind weighting.
Overall, marginally reduces compression but should potentially improve speed by over 5%.

r1382
More correct padding constants for lowres planes
Since lowres analysis isn't interlace-aware, we don't need to double the vertical padding for interlaced video.

r1381
Fix some invalid reads caught by valgrind
Temporal predictor calculation was misled by invalid reference counts for I-frames.

r1380
Periodic intra refresh
Uses SEI recovery points, a moving vertical "bar" of intra blocks, and motion vector restrictions to eliminate keyframes.
Attempt to hide the visual appearance of the intra bar when --no-psy isn't set.
Enabled with --intra-refresh.
The refresh interval is controlled using keyint, but won't exceed the number of macroblock columns in the frame.
Greatly benefits low-latency streaming by making it possible to achieve constant framesize without intra-only encoding.
Combined with slice-max size for one slice per packet, tests suggest effective resiliance against packet loss as high as 25%.
x264 is now the best free software low-latency video encoder in the world.

Accordingly, change the API to add b_keyframe to the parameters present in output pictures.
Calling applications should check this to see if a frame is seekable, not the frame type.

Also make x264's motion estimation strictly abide by horizontal MV range limits in order for PIR to work.
Also fix a major bug in sliced-threads VBV handling.
Also change "auto" threads for sliced threads to "cores" instead of "1.5*cores" after performance testing.
Also simplify ratecontrol's checking of first pass options.
Also some minor tweaks to row-based VBV that should improve VBV accuracy on small frames.

r1379
LAVF/FFMS input support, native VFR timestamp handling
libx264 now takes three new API parameters.
b_vfr_input tells x264 whether or not the input is VFR, and is 1 by default.
i_timebase_num and i_timebase_den pass the timebase to x264.

x264_picture_t now returns the DTS of each frame: the calling app need not calculate it anymore.

Add libavformat and FFMS2 input support: requires libav* and ffms2 libraries respectively.
FFMS2 is _STRONGLY_ preferred over libavformat: we encourage all distributions to compile with FFMS2 support if at all possible.
FFMS2 can be found at http://code.google.com/p/ffmpegsource/.
--index, a new x264cli option, allows the user to store (or load) an FFMS2 index file for future use, to avoid re-indexing in the future.

Overhaul the muxers to pass through timestamps instead of assuming CFR.
Also overhaul muxers to correctly use b_annexb and b_repeat_headers to simplify the code.
Remove VFW input support, since it's now pretty much redundant with native AVS support and LAVF support.
Finally, overhaul a large part of the x264cli internals.

--force-cfr, a new x264cli option, allows the user to force the old method of timestamp handling.May be useful in case of a source with broken timestamps.
Avisynth, YUV, and Y4M input are all still CFR.LAVF or FFMS2 must be used for VFR support.

Do note that this patch does *not* add VFR ratecontrol yet.
Support for telecined input is also somewhat dubious at the moment.

Large parts of this patch by Mike Gurlitz <mike.gurlitz@gmail.com>, Steven Walters <kemuri9@gmail.com>, and Yusuke Nakamura <muken.the.vfrmaniac@gmail.com>.

r1378
More help typo fixes

r1377
Fix x264_clz on inputs > 1<<31
(though x264 never generates such inputs)

r1376
Don't do sum/ssd analysis if weightp == 1
Typo fixes in comments and help.

r1375
Fix two bugs in 2-pass ratecontrol
last_qscale_for wasn't set during the 2pass init code.
abr_buffer was way too small in the case of multiple threads, so accordingly increase its buffer size based on the number of threads.
May significantly increase quality with many threads in 2-pass mode, especially in cases with extremely large I-frames, such as anime.

r1374
Avisynth-MT and 2.6 compatibility fixes
Explain to the user why YV12 conversion is forced with Avisynth 2.6.
Fix encoding with Avisynth-MT scripts by inserting the necessary Distributor() call; speeds such scripts back up to expected levels.

r1373
Fix zone parsing on mingw
Due to MinGW evidently being in the hands of a pack of phenomenal idiots, MinGW does not have strtok_r, a basic string function.
As such, remove the dependency on strtok_r in zone parsing.
Previously, using zones for anything other than ratecontrol failed.

r1372
More lookahead optimizations
Under subme 1, don't do any qpel search at all and round temporal MVs accordingly.
Drop internal subme with subme 1 to do fullpel predictor checks only.
Other minor optimizations.

r1371
missing changes from previous commits

r1370
Fix regression in direct=auto/temporal in r1364
Bug caused rare race condition in frame reference handling.
This resulted in invalid bitstreams in some B-frames and, very rarely, crashes.

r1369
Add fast pskip to x264 SEI info header

r1368
Minor seeking fix with Avisynth input
Seeking past the end of the input with --seek would result in the same frame being repeated over and over.

r1367
Add support for MB-tree + B-pyramid
Modify B-adapt 2 to consider pyramid in its calculations.
Generally results in many more B-frames being used when pyramid is on.
Modify MB-tree statsfile reading to handle the reordering necessary.
Make differing keyint or pyramid between passes into a fatal error.

r1366
Use aliasing-avoidance macros in array_non_zero

r1365
MMX version of 8x8 interlaced zigzag
Just as fast as SSSE3 on Nehalem (and faster on Conroe/Penryn), so remove the SSSE3 version.

r1364
Bring back slice-based threading support
Enabled with --sliced-threads
Unlike normal threading, adds no encoding latency.
Less efficient than normal threading, both performance and compression-wise.
Useful for low-latency encoding environments where performance is still important, such as HD videoconferencing.
Add --tune zerolatency, which eliminates all x264 encoder-side latency (no delayed frames at all).
Some tweaks to VBV ratecontrol and lookahead (in addition to those required by sliced threading).
by a media streaming company that wishes to remain anonymous.
:Mon Dec 7 18:17:29 2009 -0800

Add more detailed help for presets/tunes/profiles
Shows what options they represent.

r1362
qpel RD no longer needs mbcmp_unaligned

r1361
ensure that all boolean options are {0,1} so they print consistently in the options SEI

r1360
Actually do r1356
Somehow commit r1356 got lost in the ether.I'm not sure how, but now it's fixed.

r1359
Remove some unused code from x264.c

r1358
SSSE3 version of zigzag_8x8_field
Slightly faster interlaced encoding with 8x8dct.
Helps most on Nehalem, somewhat disappointing on Conroe/Penryn.

r1357
Fix crash in interlaced with >8 refs
Crash introduced in weightp.

r1356
Significantly faster qpel-RD
Cache the results of MC, like in bidir-RD.
Slightly changes output due to the necessary reordering of satd/RD calls.
5-10% faster qpel-RD.

r1355
Add x264 prefix to functions with ffmpeg equivalents
Not important now, but will be when we add libav* input support.

r1354
10L in r1353
Broke mp4 output.

r1353
Enhanced Avisynth input support
Requires avisynth_c.h from the Avisynth API headers.
Reports errors properly from Avisynth script input.
Automatically construct input scripts for almost any input file.
Tries ffmpegsource2, DSS2, directshowsource, and many other sourcing methods, based on the input file extension.
Automatically converts to YV12.

r1352
Much faster weightp
Move sum/ssd calculation out of lookahead and do it only once per frame.
Also various minor optimizations, cosmetics, and cleanups.

r1351
Fix bugs in fps/timestamp handling in FLV muxer

r1350
Fix bug in weightp analysis
Weights weren't reset upon early terminations, so old (wrong) weights could stick around.
Small compression improvement.

r1349
Minor deblocking optimization, update comments

r1348
Fix weightb with delta_poc_bottom
Has no effect yet, but will be required once we add TFF/BFF signalling support in interlaced mode.
Gives 0.5-0.7% better compression with proper TFF/BFF signalling.

r1347
Give more meaningful error if 1st/2nd pass resolution differ

r1346
Fix extremely rare deadlock with sync-lookahead
Patch partially by Anton Mitrofanov.

r1345
Only print weightp stats if there were P-frames

r1344
Faster lookahead with subme=1
If it hasn't been clear already, don't use subme=1 as a "fast first pass" option.
Use subme=2 instead; 1 and below now enable a fast (lower quality) lookahead mode.

r1343
Faster weightp analysis
Modify pixel_var slightly to return the necessary information and use it for weight analysis instead of sad/ssd.
Various minor cosmetics.

r1342
Fix two issues in weightp
If analysis decided on an offset of -128, x264 would create non-compliant streams.
Fix some cases with nearly all intra blocks where analysis could pick very weird weights.
Also add some asserts to check compliancy.

r1341
Allow compilation with non-Apple GCC on OS X

r1340
Use __attribute__((may_alias)) for type-punning
GCC thinks pointer casts to unions aren't valid with strict aliasing.
See http://gcc.gnu.org/onlinedocs/gcc-4.4.2/gcc/Optimize-Options.html#Type_002dpunning.
Also use M32() in y4m.c.
Enable -Wstrict-aliasing again since all such warnings are fixed.

r1339
100l in deadlock fix

r1338
FLV muxing support

r1337
Fix rare deadlock introduced in weightp

r1336
Actually add -Wno-strict-aliasing to configure

r1335
Various weightp fixes
Make weightp results match in threaded vs non-threaded mode.
Fix two-pass with slow-firstpass.

r1334
Fix all aliasing violations
New type-punning macros perform write/read-combining without aliasing violations per the second-to-last part of 6.5.7 in the C99 specification.
GCC 4.4, however, doesn't seem to have read this part of the spec and still warns about the violations.
Regardless, it seems to fix all known aliasing miscompilations, so perhaps the GCC warning generator is just broken.
As such, add -Wno-strict-aliasing to CFLAGS.

r1333
Fix 10l in weightp on ARM

r1332
Fix one (of possibly many) miscompilations in weightp
Use NOINLINE and some emms calls to fix emms reordering issues.
This issue occurred with some GCC versions if threads > 1 and the phase of the moon was right.
Also a cosmetic in x264.c.

r1331
Fix pixel_ssd on win64
Didn't preserve XMM registers, may or may not have caused problems.

r1330
Fix weightp logfile parsing on MinGW

r1329
cosmetics

r1328
Fix weightp on ARM + PPC
No ARM or PPC assembly yet though.

r1327
Weighted P-frame prediction
Merge Dylan's Google Summer of Code 2009 tree.
Detect fades and use weighted prediction to improve compression and quality.
"Blind" mode provides a small overall quality increase by using a -1 offset without doing any analysis, as described in JVT-AB033.
"Smart", the default mode, also performs fade detection and decides weights accordingly.
MB-tree takes into account the effects of "smart" analysis in lookahead, even further improving quality in fades.
If psy is on, mbtree is on, interlaced is off, and weightp is off, fade detection will still be performed.
However, it will be used to adjust quality instead of create actual weights.
This will improve quality in fades when encoding in Baseline profile.

Doesn't add support for interlaced encoding with weightp yet.
Only adds support for luma weights, not chroma weights.
Internal code for chroma weights is in, but there's no analysis yet.
Baseline profile requires that weightp be off.
All weightp modes may cause minor breakage in non-compliant decoders that take shortcuts in deblocking reference frame checks.
"Smart" may cause serious breakage in non-compliant decoders that take shortcuts in handling of duplicate reference frames.

Thanks to Google for sponsoring our most successful Summer of Code yet!

r1326
Fix assert failure in the case of forced i-frames
Note that this applies to non-IDR i-frames, not IDR-frames.
This fix is also required for future open-gop.

r1325
Fix issues relating to input/output files being pipes/FIFOs

r1324
Various ARM-related fixes
Fix comment for mc_copy_neon.
Fix memzero_aligned_neon prototype.
Update NEON (i)dct_dc prototypes.
Duplicate x86 behavior for global+hidden functions.

r1323
Fix miscompilation with gcc 4.3 on ARM
Aliasing violation in spatial prediction caused nasty artifacts.
Shut up two other GCC warnings while we're at it.

r1322
Fix extremely rare infinite loop in 2-pass VBV
Implicit conversion from double->float lost enough precision to cause the loop termination condition to never trigger.
Bug report by Tal Aloni.

r1321
Fix large file support, broken in r1302

r1320
Dramatically reduce size of pixel_ssd_* asm functions
~10k of code size eliminated.

r1319
fix bottom-right pixel of lowres planes, which was uninitialized.
weirdly, valgrind reported this only with --no-asm.

r1310
cosmetics

r1309
ISC-license x86inc.asm
As the assembly abstraction layer is very useful in non-x264 projects, it is now ISC (simplified BSD) so that others, even in commercial projects, can use it as well.

r1308
Various minor CABAC optimizations

r1307
Fix bug in b-pyramid strict
Bug caused invalid streams in some situations.

r1306
Remove non-mod16 warning
Compression only "suffers" by an extremely marginal amount and too many people misinterpret the warning.

r1305
Fix two warnings + some minor optimizations

r1304
Fix a typo in b-pyramid help
And an errant space in common/macroblock.c

r1303
A bit more write-combining in macroblock_cache_load

r1302
split muxers.c into one file per format
simplify internal muxer API

r1301
Update fprofile with the latest change to b-pyramid

r1300
Fix assertion fail and incorrect costs with pyramid+VBV
Deal properly with QPfile'd B-refs.x264 should handle multiple B-refs per minigop now, though only via forced frametypes.

r1299
Improve CRF initial QP selection, fix get_qscale bug
If qcomp=1 (as in mb-tree), we don't need ABR_INIT_QP.
get_qscale could give slightly weird results with still images

r1298
Print more accurate error message if dump_yuv fails

r1297
Reduce memory usage of b-adapt 2 trellis
Also fix a minor bug where the algorithm ignored the last frame in the trellis.

r1296
Make B-pyramid spec-compliant
The rules of the specification with regard to picture buffering for pyramid coding are widely ignored.
x264's b-pyramid implementation, despite being practically identical to that proposed by the original paper, was technically not compliant.
Now it is.
Two modes are now available:
1) strict b-pyramid, while worse for compression, follows the rule mandated by Blu-ray (no P-frames can reference B-frames)
2) normal b-pyramid, which is like the old mode except fully compliant.
This patch also adds MMCO support (necessary for compliant pyramid in some cases).
MB-tree still doesn't support b-pyramid (but will soon).

r1295
Add missing free for nal_buffer
Fixes a memory leak.

r1294
sync yasm macros to ffmpeg

r1293
eliminate some divisions

r1292
Fix glitches with slow-firstpass + weightb + multiref + 2pass
Bug in r1277

r1291
Simplify some code in b-adapt 2's trellis

r1290
Fix a very rare integer overflow in slicetype analysis
Caused an assert failure when it occurred.
Bug is as old as adaptive B-frames.

r1289
Reduce the aggressiveness of 2-pass VBV
Now that B-frames are properly covered, we don't have to be as aggressive.
This eliminates some issues with skyrocketing QPs in B-frames in 2-pass VBV.

r1288
Fix regression: disable flash detection without B-frames

r1287
change all dct arrays to 1d.
the C standard doesn't allow you to iterate 1-dimensionally over 2d arrays, and nothing other than the dsp functions themselves cares about the 2dness of dct.
this fixes a miscompilation in x264_mb_optimize_chroma_dc.

r1286
Add row-based VBV for B-frames
While B-frames still aren't explicitly covered by ratecontrol, this should resolve issues of VBV underflows due to larger-than-expected B-frames.

r1285
Improve VBV, fix bug in 2-pass VBV introduced in MB-tree
Bug caused AQ'd row/frame costs to not be calculated (and thus caused underflows).
Also make VBV more aggressive with more threads in 2-pass mode.
Finally, --ratetol now affects VBV aggressiveness (higher is less aggressive).

r1284
Optimize exp2fix8
Slightly faster and more accurate rounding.

r1283
Avoid scenecuts in flashes and similar situations
"Flashes" are defined as any scene which lasts a very short period before a previous scene returns.
A common example of this is of course a camera flash.
Accordingly, look ahead during scenecut analysis and rule out the possibility of certain frames being scenecuts.
Also handles cases of tons of short scenes in sequence and avoids making those scenecuts as well.
Can only catch flashes of 1 frame in length with b-adapt 1.
With b-adapt 2, can catch flashes of length --bframes.
Speed cost should be negligible.

r1282
Fix bug where x264 generated non-compliant bitstreams with insane SAR values

r1281
rm msvc project files and related ifdefs

r1280
SSE4 version of 4x4 idct
27->24 clocks on Nehalem.
This is really just an excuse to use "movsd" in a real function.
Add some comments to subsum-related macros in x86util.

r1279
Constrained intra prediction support
Enable with --constrained-intra.Significantly reduces compression, but required for the base layer of SVC encodes and maybe some other use-cases.

Commit sponsored by a media streaming company that wishes to remain anonymous.

r1278
Slightly improve non-RD p8x8 mode decision
Subpartition costs are effectively zero in CABAC if sub-8x8 search is off.

r1277
Reorder reference frames optimally on second pass
About +0.1-0.2% compression at normal bitrates, up to +1% at very low bitrates.
Only works if the first pass uses the same number of refs as the second (i.e. not with fast first pass).
Thus, only worthwhile at insanely slow speeds: as such, enable slow-firstpass by default with preset placebo.
Note that this changes the stats file format!

r1276
Fix typo in ratecontrol_summary

r1275
Clip log2_max_frame_num
It's still much higher than it needs to be, but that will be fixed with the upcoming MMCO patch.
Also make sure we don't write too large a frame_num or poc in slice header.

r1274
Fix some issues with 3-pass statsfile handling
The value of i_frame during encoder_close was incorrect.

r1273
Fix ctrl-C termation message with few frames encoded

r1272
Add support for single-frame VBV, improve compliance
This allows both constant-framesize and capped-framesize encoding.
Literal constant framesize isn't actually supported yet due to the lack of
filler support.
Example with 30fps video: --vbv-bufsize 200 --vbv-maxrate 6000 will ensure that
no frame is ever larger than 200 kilobits.

One example use-case of this is for zero-delay streaming where bandwidth costs
need to be minimized.If every frame is smaller than 200 kilobits and the
client has a 6 megabit connection, every single frame can be instantly sent
to the client and handled without any decoder-side buffer.

Fix a mistake in VBV calculation--this may have caused the VBV to be slightly
non-compliant in some situations without x264 realizing it.
Add primitive prediction handling for rows with quantizers lower than their
reference.This slightly improves VBV in CBR mode.
Various other minor improvements to VBV, mostly to make single-frame VBV work.

Commit sponsored by a media streaming company that wishes to remain anonymous.

r1271
Fix 10l in API change
frame_num was set to 1, not 0, for the first frame.This broke spec compliance.
Didn't actually seem to cause any problems though except for breaking decoding on Quicktime.

r1270
Allow user-set FPS for inputs other than YUV

r1269
Improve threaded frame handling
Avoid unnecessary cond_wait

r1268
Attempt to detect miscompilation due to bug in gcc 4.2
I don't know if this bug still affects latest x264, but it can't hurt to try to detect it.
Accordingly refuse to open the encoder if detected.
Apparently VLC (on Windows) has been distributed for some time with a completely broken x264 due to the use of a completely broken compiler (gcc 4.2).In particular, the MV costs seem to be calculated incorrectly on win32 when linking from an application compiled without -ffast-math to an application with -ffast-math.
I am not entirely certain why this occurs, but the result is, unsurprisingly, encoding quality that makes MPEG-2 look good, due to the motion search being completely broken.

r1267
Really fix encoder_close crash this time
Not-entirely-fixed in r1253.

r1266
Check for 16x16 partitions masquerading as smaller ones
Saves a few bits when using qpel-RD.

r1265
Update config.guess/sub; add Snow Leopard support

r1264
Fix integer overflow in 2-pass VBV
Bug caused slight undersizing in 2-pass mode in some cases.

r1263
Fix bug with various bizarre commandline combinations and mbtree
Second pass would have mbtree on even though the first pass didn't (and thus encoding would immediately fail).

r1262
Add intra prediction modes to output stats
Also eliminate some NANs in stat output with intra-only encoding.
Marginal speedup: disable stat calculation if log level is below X264_LOG_INFO.
Various minor cosmetics.

r1261
Overhaul syntax in muxers.c/matroska.c
The inconsistent syntax in these files has finally come to an end.

r1260
Major API change: encapsulate NALs within libx264
libx264 now returns NAL units instead of raw data.x264_nal_encode is no longer a public function.
See x264.h for full documentation of changes.
New parameter: b_annexb, on by default.If disabled, startcodes are replaced by sizes as in mp4.
x264's VBV now works on a NAL level, taking into account escape codes.
VBV will also take into account the bit cost of SPS/PPS, but only if b_repeat_headers is set.
Add an overhead tracking system to VBV to better predict the constant overhead of frames (headers, NALU overhead, etc).

r1259
Add missing fclose for mbtree input statsfile on second pass
Bug report by VFRmaniac

r1258
Improve progress indicator behavior
Progress indicator will now indicate based on output frame, not input frame.

r1257
Update yasm configure check
lzcnt apparently requires yasm 0.6.2.

r1256
Make MV costs global instead of static
Fixes some extremely rare threading race conditions and makes the code cleaner.
Downside: slightly higher memory usage when calling multiple encoders from the same application.

r1255
Don't print scenecut message multiple times in verbose mode
Occurred mostly with b-adapt 2.

r1254
Optimize rounding of luma and chroma DC coefficients
Reduce bitrate mostly-losslessly at low quantizers.
In some rare cases, bitrate reduction may be as high as 10%.
Luma rounding optimization (helps much less than chroma) requires trellis.

r1253
Fix crash if encoder_close is called before delayed frames are flushed
Also no longer flush frames when ctrl-Cing x264, so x264 will close faster.

r1252
Improve x264 help
Now has three help options: --help, --longhelp, and --fullhelp.
--help only shows the most basic options; most users should not need more than these.
Add usage examples.
Fix typo in a comment.

r1251
Factor out a redundant RD call in qpel-RD
Fixes a problem that was supposed to be, but didn't, get fully fixed in r1238.

r1250
Fix RD early-skip
Small quality improvement and speedup, was broken by r1214.

r1249
Faster CAVLC mb header writing for B macroblocks

r1248
Compile fixes for pre-ARMv6T2 and/or PIC

r1247
Change priority handling on some OSs
Instead of setting the lookahead thread to max priority, lower all the other threads' priorities instead.
This is particularly useful when the "max priority" is "realtime", as in Windows, which can cause some problems.

r1246
Threaded lookahead
Move lookahead into a separate thread, set to higher priority than the other threads, for optimal performance.
Reduces the amount that lookahead bottlenecks encoding, greatly increasing performance with lookahead-intensive settings (e.g. b-adapt 2) on many-core CPUs.
Buffer size can be controlled with --sync-lookahead, which defaults to auto (threads+bframes buffer size).
Note that this buffer is separate from the rc-lookahead value.
Note also that this does not split lookahead itself into multiple threads yet; this may be added in the future.
Additionally, split frames into "fdec" and "fenc" frame types and keep the two separate.
This split greatly reduces memory usage, which helps compensate for the larger lookahead size.
Extremely special thanks to Michael Kazmier and Alex Giladi of Avail Media, the original authors of this patch.

r1245
Force a link error in case of incompatible API
This is because the number of bug reports due to miscompiled ffmpeg builds is reaching critical mass.
The name of x264_encoder_open is now #defined based on the current X264_BUILD.
Note that this changes the calling convention required for dlopen, but not for ordinary calls to x264_encoder_open.

r1244
Get rid of "CBR" descriptor from qcomp
Though technically accurate in some vague way, I have never actually seen this
option used correctly, rather it has been used by hundreds of people who can't
read the documentation and believe that qcomp=0 is what should be used for CBR
encoding.

r1243
Faster me=tesa
But it still spends all too much time in me_search_ref rather than asm.

r1242
Multi-slice encoding support
Slicing support is available through three methods (which can be mixed):
--slices sets a number of slices per frame and ensures rectangular slices (required for Blu-ray).Overridden by either of the following options:
--slice-max-mbs sets a maximum number of macroblocks per slice.
--slice-max-size sets a maximum slice size, in bytes (includes NAL overhead).
Implement macroblock re-encoding support to allow highly accurate slice size limitation.Might be useful for other things in the future, too.

r1241
Fix a valgrind warning in b-adapt 2

r1240
fix asm symbols for oprofile (regression in r1221)

r1239
Fix bug in intra analysis in B-frames
i8x8/i4x4 never got analysed when fast_intra was toggled and RD was off; up to a 2-3% quality improvement in non-RD mode.
With this bug dating back to r369, this is probably the second-oldest bug ever fixed in x264.

r1238
Fix bug in b16x16 qpel RD
Incorrect cost was used to initialize the search.

r1237
Check minimum chroma QP in addition to luma QP during CQM init
Correctly error out if the implied minimum chroma QP is too low.
Add missing emms to checkasm macroblock_tree_propagate test.

r1236
Faster mbtree propagate and x264_log2, less memory usage
Avoid an int->float conversion with a small table.
Change lowres_inter_types to a bitfield; cut its size by 75%.
Somewhat lower memory usage with lots of bframes.
Make log2/exp2 tables global to avoid duplication.

r1235
Fix keyint=1 + VBV + rc-lookahead

r1234
Faster x264_exp2fix8
22->13 cycles on Core 2 with mfpmath=sse

r1233
compile x86 with fpmath=sse by default

r1232
ARM configure: enable NEON-related options by default
When compiling for ARM, x264 will compile by default for Cortex A8 unless specified otherwise.
To compile for pre-ARMv6, --disable-asm is required.

r1231
2-pass VBV fixes
Properly run slicetype frame cost with 2pass + MB-tree.
Slash the VBV rate tolerance in 2-pass mode; increasing it made sense for the highly reactive 1-pass VBV algorithm, but not for 2-pass.
2-pass's planned frame sizes are guaranteed to be reasonable, since they are based on a real first pass, while 1-pass's, based on lookahead SATD, cannot always be trusted.

r1230
GSOC merge part 8: ARM NEON intra prediction assembly functions (partial)
4x4 dc/h/ddr/ddl, 8x8 dc/h, 8x8c h/v, 16x16 dc/h/v

r1229
GSOC merge part 7: ARM NEON deblock assembly functions (partial)
Originally written for ffmpeg by Mans Rullgard; ported by David.
Luma and chroma inter deblocking; no intra yet.

r1228
GSOC merge part 6: ARM NEON quant assembly functions (partial)
(de)quant 4x4, (de)quant 8x8, (de)quant DC, coeff_last

r1227
GSOC merge part 5: ARM NEON dct assembly functions
(i)dct4x4dc, (i)dct4x4, (i)dct8x8, (i)dct_dc, zigzag_scan_frame_4x4

r1226
GSOC merge part 4: ARM NEON mc assembly functions
prefetch, memcpy_aligned, memzero_aligned, avg, mc_luma, get_ref, mc_chroma, hpel_filter, frame_init_lowres

r1225
GSOC merge part 3: ARM NEON pixel assembly functions
SAD, SADX3/X4, SSD, SATD, SA8D, Hadamard_AC, VAR, VAR2, SSIM

r1224
GSOC merge part 2: ARM stack alignment
Neither GCC nor ARMCC support 16 byte stack alignment despite the fact that NEON loads require it.
These macros only work for arrays, but fortunately that covers almost all instances of stack alignment in x264.

r1223
Fix unaligned accesses in bitstream writer
Fixes x264 on CPUs with no unaligned access support (e.g. SPARC).
Improves performance marginally on CPUs with penalties for unaligned stores (e.g. some x86).

r1222
Fix bug in calculation of I-frame costs with AQ.

r1221
GSOC merge part 1: Framework for ARM assembly optimizations
x264 will detect which ARM core it's building for and only build NEON asm if the target is ARMv6 or above, then enable NEON at runtime.

r1220
Fix a bug in checkasm and two OSX fixes
MC chroma checkasm test could crash in some situations
Remove -lmx, as it's not needed and the iPhone doesn't have it.
Remove unused sqrtf emulation; it breaks if math.h is included.

r1219
Improve QPRD
Always check the last macroblock's QP, even if the normal search doesn't reach it.
Raise the failure threshold when moving towards the last macroblock's QP.
0.2-1% improved compression.

r1218
Fix MB-tree with keyint<3
Also slightly improve VBV keyint handling.

r1217
Fix bug in VBV lookahead + no MB-tree
I-frames need to have VBV lookahead run on them as well.

r1216
Add support for frame-accurate parameter changes
Parameter structs can now be passed with individual frames.
The previous method would only change the parameter of what was currently being encoded, which due to delay might be very far from an intended exact frame.
Also add support for changing aspect ratio.Only works in a stream with repeating headers and requires the caller to force an IDR to ensure instant effect.

r1215
Fix x264_encoder_reconfig with multithreading
New behavior: reconfigging the encoder will result in changes being applied
to each of the encoding threads as they finish encoding the current frame.

r1214
Fix two bugs in QPRD
QPRD could in some cases force blocks to skip when they shouldn't be ~(+0.01db)
Force QPRD to abide by qpmin/qpmax restrictions.

r1213
Lookahead VBV
Use the large-scale lookahead capability introduced in MB-tree for ratecontrol purposes.
(Does not require MB-tree, however.)
Greatly improved quality and compliance in 1-pass VBV mode, especially in CBR; +2db OPSNR or more in some cases.
Fix some other bugs in VBV, which should improve non-lookahead mode as well.
Change the tolerance algorithm in row VBV to allow for more significant mispredictions when buffer is nearly full.
Note that due to the fixing of an extremely long-standing bug (>1 year), bitrates may change by nontrivial amounts in CRF without MB-tree.

r1212
Fix bug in b-adapt 1
B-adapt 1 didn't use more than MAX(1,bframes-1) B-frames when MB-tree was off.

r1211
Fix a potential failure in VBV
If VBV does underflow, ratecontrol could be permanently broken for the rest of the clip.
Revert part of the previous VBV changes to fix this.

r1210
new API function x264_encoder_delayed_frames.
fix x264cli on streams whose total length is less than the encoder latency.

r1209
Add no-mbtree to fprofile (and fix pyramid in fprofile)

r1208
Don't print a warning about direct=auto in 2pass when B-frames are off

r1207
fix lowres padding, which failed to extrapolate the right side for some resolutions.
fix a buffer overread in x264_mbtree_propagate_cost_sse2. no effect on actual behavior, only theoretical correctness.
fix x264_slicetype_frame_cost_recalculate on I-frames, which previously used all 0 mb costs.
shut up a valgrind warning in predict_8x8_filter_mmx.

r1206
simd part of x264_macroblock_tree_propagate.
1.6x faster on conroe.

r1205
MB-tree fixes:
AQ was applied inconsistently, with some AQed costs compared to other non-AQed costs. Strangely enough, fixing this increases SSIM on some sources but decreases it on others. More investigation needed.
Account for weighted bipred.
Reduce memory, increase precision, simplify, and early terminate.

r1204
Add missing free()s for new data allocated for MB-tree
Eliminates a memory leak.

r1203
Fix keyframe insertion with MB-tree and no B-frames

r1202
Fix MP4 output (bug in malloc checking patch)

r1201
Gracefully terminate in the case of a malloc failure
Fuzz tests show that all mallocs appear to be checked correctly now.

r1200
Fix a potential infinite loop in QPfile parsing on Windows
ftell doesn't seem to work properly on Windows in text mode.

r1199
Fix delay calculation with multiple threads
Delay frames for threading don't actually count as part of lookahead.

r1198
Add "veryslow" preset
Apparently some people are actually *using* placebo, so I've added this preset to bridge the gap.

r1197
Macroblock-tree ratecontrol
On by default; can be turned off with --no-mbtree.
Uses a large lookahead to track temporal propagation of data and weight quality accordingly.
Requires a very large separate statsfile (2 bytes per macroblock) in multi-pass mode.
Doesn't work with b-pyramid yet.
Note that MB-tree inherently measures quality different from the standard qcomp method, so bitrates produced by CRF may change somewhat.
This makes the "medium" preset a bit slower.Accordingly, make "fast" slower as well, and introduce a new preset "faster" between "fast" and "veryfast".
All presets "fast" and above will have MB-tree on.
Add a new option, --rc-lookahead, to control the distance MB tree looks ahead to perform propagation analysis.
Default is 40; larger values will be slower and require more memory but give more accurate results.
This value will be used in the future to control ratecontrol lookahead (VBV).
Add a new option, --no-psy, to disable all psy optimizations that don't improve PSNR or SSIM.
This disables psy-RD/trellis, but also other more subtle internal psy optimizations that can't be controlled directly via external parameters.
Quality improvement from MB-tree is about 2-70% depending on content.
Strength of MB-tree adjustments can be tweaked using qcompress; higher values mean lower MB-tree strength.
Note that MB-tree may perform slightly suboptimally on fades; this will be fixed by weighted prediction, which is coming soon.

r1196
Various 1-pass VBV tweaks
Make predictors have an offset in addition to a multiplier.
This primarily fixes issues in sources with lots of extremely static scenes, such as anime and CGI.
We tried linear regressions, but they were very unreliable as predictors.
Also allow VBV to be slightly more aggressive in raising QPs to avoid not having enough bits left in some situations.
Up to 1db improvement on some clips.

r1195
Fix another 10L in QPRD
An entry in subpel_iterations was missing.
I have no idea how QPRD was working at all without this change.

r1194
Update help and cleanup in ratecontrol.c
Deal with some out-of-date information.

r1193
15% faster refine_bidir_satd, 10% faster refine_bidir_rd (or less with trellis=2)
re-roll a loop (saves 44KB code size, which is the cause of most of this speed gain)
don't re-mc mvs that haven't changed

r1192
Faster bidir_rd plus some bugfixes
Cache chroma MC during refine_bidir_rd and use both the luma and chroma caches to skip MC in macroblock_encode.
Fix incorrect call to rd_cost_part; refine_bidir_rd output was incorrect for i8>0.
Remove some redundant clips.
~12% faster refine_bidir_rd.

r1191
Add "fastdecode" tune option
It does what it says it does.

r1190
Fix two bugs in QPRD
fprofile settings now actually fprofile QPRD.
Don't use i_mbrd before initializing it.

r1189
Fix 10l in QPRD
Trellis used wrong lambda with trellis=1

r1188
Fix a nondeterminism with threads and subme>7
Also add a few more checks to eliminate the need for spel_border.

r1187
Add QPRD support as subme=10
Refactor trellis lambda selection to be done in analyse_init instead of in trellis.
This will allow for more easy adaption of lambda later on; for now it allows constant lambda across variable QPs.
QPRD is only available with adaptive quantization enabled and generally improves SSIM and visual quality.
Additionally, weight the SSD values from RD based on the relative QP offset for chroma; helps visually at high QPs where chroma has a lower QP than luma.
This fixes some visual artifacts created by QPRD at high QPs.
Note that this generally hurts PSNR and SSIM, and so is only on when psy-RD is on.

r1186
SSSE3 cachesplit workaround for avg2_w16
Palignr-based solution for the most commonly used qpel function.
1-1.5% faster overall on Core 2 chips.

r1185
shut up valgrind warnings in trellis

r1184
New AQ algorithm option
"Auto-variance" uses log(var)^2 instead of log(var) and attempts to adapt strength per-frame.
Generates significantly better SSIM; on by default with --tune ssim.
Whether it generates visually better quality is still up for debate.
Available as --aq-mode 2.

r1183
Cacheline-split SSSE3 chroma MC
~70% faster chroma MC on 32-bit Conroe
Also slightly faster SSSE3 intra_sad_8x8c

r1182
Improve documentation of qp/crf options

r1181
Merge array_non_zero into zigzag_sub
Faster lossless, cleaner code.
SSSE3 version of zigzag_sub_4x4_field, faster lossless interlaced coding.

r1180
Fix bug in reference frame autoadjustment
For some types of input file, x264 did the adjustment before width/height were known.

r1179
Fix fprofile settings to match changes in defaults
Also add b-adapt 2 to fprofile.

r1178
Slightly faster dequant_flat assembly
Eliminate some redundant shifts.

r1177
Totally new preset system for x264.c (not libx264), new defaults
Other new features include "tune" and "profile" settings; see --help for more details.
Unlike most other settings, "preset" and "tune" act before all other options.
However, "profile" acts afterwards, overriding all other options.
Our defaults have also changed: new defaults are --subme 7 --bframes 3 --8x8dct --no-psnr --no-ssim --threads auto --ref 3 --mixed-refs --trellis 1 --weightb --crf 23 --progress.
Users will hopefully find these changes to greatly improve usability.

r1176
Update Gabriel's email address in AUTHORS

r1175
Early termination for chroma encoding
Faster chroma encoding by terminating early if heuristics indicate that the block will be DC-only.
This works because the vast majority of inter chroma blocks have no coefficients at all, and those that do are almost always DC-only.
Add two new helper DSP functions for this: dct_dc_8x8 and var2_8x8.mmx/sse2/ssse3 versions of each.
Early termination is disabled at very low QPs due to it not being useful there.
Performance increase is ~1-2% without trellis, up to 5-6% with trellis=2.
Increase is greater with lower bitrates.

r1174
Fix bug in checkasm
frame_init_lowres_core check didn't check the C plane.
However, all x86 and PPC assembly was correct regardless of the unit test being incorrect.

r1173
Add subpartition cost for sub-8x8 blocks
Improves sub-p8x8 mode decision.

r1172
Yet more CABAC and CAVLC optimizations
Also clean up a lot of pointless code duplication in CAVLC MV coding.

r1171
Various CABAC optimizations and cleanups
Faster CABAC CBF context calculation for inter blocks.
Add x264_constant_p(), will probably be useful in the future as well.
Simpler subpartition functions.
Clean up and optimize mvd_cpn a bit more.
Various other minor optimizations.

r1170
AltiVec version of frame_init_lowres_core. 22.4x faster than C on PPC7450 and 25x on PPC970MP.

r1169
MMX CABAC mvd sum calculation
Faster CABAC mvd coding.

r1168
Faster MV prediction
Smaller code size, plus I get to use goto.

r1167
Fix potential crash in checkasm
ssim_end4_sse2 requires aligned sums

r1166
SSSE3, faster SSE2/MMX integral_init4v
The real reason I wrote this was an excuse to use shufpd.

r1165
configure check for uclinux

r1164
fix a crash on frame width <= 48 pixels

r1163
configure check for cc, rather than reporting lack of compiler as an asm error.
configure check for -mno-cygwin, since it's removed from gcc4.

r1162
a better way to keep track of mv candidates.
2-4% faster dia, hex, and umh.

r1161
reorder some motion estimation patterns.
this change is useless on its own, but segregates the bitstream-changing part out of my next optimization.

r1160
Fix VBV warning broken in r915
x264 will now correctly warn about maxrate specified without bufsize even when a level is not set.

r1159
configure check for ssse3-capable binutils

r1158
Fix 10L in r1155
Broke --me esa/tesa due to forgetting to add handling for x264_cost_mv_fpel.

r1157
Fix bug where satd was incorrectly used with subme<=1
Faster subme<=1 with i4x4 enabled.

r1156
Remove some pointless error handling code in cabac/cavlc

r1155
Save some memory on mv cost arrays
Have quantizers that use the same lambda share the same cost array.

r1154
Various CABAC and CAVLC optimizations
Backport CAVLC partial-inlining early termination to CABAC (~2-4% faster CABAC residual coding)

r1153
fix a race condition at the end of thread_input

r1152
Various trellis speed optimizations

r1151
Make i686 the default arch on x86_32
Disabling asm will default to a generic arch.
Also fix configure for gcc 4.4.

r1150
Faster signed golomb coding
3% faster CAVLC RDO and bitstream writing.

r1149
Faster spatial direct MV prediction
unroll/tweak col_zero_flag

r1148
More CABAC and CAVLC optimizations
Simplified function calling for block_residual_write_(cabac|cavlc) and improved sigmap coding.
Tried making 0/1-bit specific versions of CABAC asm, but benefit was minimal under GCC 4.3.
Helped a decent bit under 3.4, but you shouldn't be using such old versions anyways.

r1147
Various optimizations in frametype lookahead

r1146
Some cosmetics/cleanup
Move some macros to x86util.asm that should have been there to begin with.
Fix a typo that didn't cause any issues.

r1145
fix "incompatible types in initialization" compilation issues with GCC 4.3 (which is stricter than previous compiler version)

r1144
fix conversions between vectors with differing element types or numbers of subparts errors

r1143
Add "coded blocks" stat to output information.
This measures the total percentage of blocks, intra and inter, which have nonzero coefficients.
"y,uvAC,uvDC" refers to luma, chroma DC, and chroma AC blocks.
Note that skip blocks are included in this stat.

r1142
Enable asm predict_8x8_filter
I'm not entirely sure how this snuck its way out of holger's intra pred patch.

r1141
Remove various bits of dead code found by CLANG.

r1140
Slightly faster SSE4 SA8D, SSE4 Hadamard_AC, SSE2 SSIM
shufps is the most underrated SSE instruction on x86.

r1139
Various CABAC optimizations
Move calculation of b_intra out of the core residual loop and hardcode it where applicable.
Inlining cabac_mb_mvd was unnecessary and wasted tremendous amounts of code size.Inlining only cache_mvd is faster and significantly smaller.

r1138
CAVLC optimizations
faster bs_write_te, port CABAC context selection optimization to CAVLC.

r1137
Faster CABAC RDO
Since the bypass case is quite unlikely, especially when doing merged sigmap/level coding,
it's faster to use a branch than a cmov.

r1136
Activate intra_sad_x3_8x8c in lookahead

r1135
MBAFF interlaced coding is not allowed in baseline profile

r1134
intra_sad_x3_8x8 assembly

r1133
intra_sad_x3_4x4 assembly

r1132
intra_sad_x3_8x8c assembly
Also fix intra_sad_x3_16x16's use of "n" as a loop variable (broke SWAP)

r1131
Shave one instruction off CABAC encode_decision
range_lps>>6 ranges from 4-7, so (range_lps>>6)-4 == (range_lps>>6) & 3

r1130
Faster probe_skip
Add a second chroma threshold after the DC transform.

r1129
Add missing "static" qualifier to two arrays
Should slightly improve performance.

r1128
SSE2 zigzag_interleave
Replace PHADD with FastShuffle (more accurate naming).
This flag represents asm functions that rely on fast SSE2 shuffle units, and thus are only faster on Phenom, Nehalem, and Penryn CPUs.

r1127
Faster integral_init
palignr to avoid unaligned loads is worth it in inith, but not initv.

r1126
Faster SSSE3 hpel_filter_v
~10% faster hpel_filter on 64-bit Penryn.
32-bit version by Jason Garrett-Glaser.

r1125
Faster SSE2 pixel_var
Optimized using the DEINTB method from r1122.~32% faster var_16x16 on Conroe.

r1124
SSSE3 hpel_filter_v
Optimized using the same method as in r1122.Patch partially by Holger.
~8% faster hpel filter on 64-bit Nehalem

r1123
Update some asm copyright headers

r1122
Vastly faster SATD/SA8D/Hadamard_AC/SSD/DCT/IDCT
Heavily optimized for Core 2 and Nehalem, but performance should improve on all modern x86 CPUs.
16x16 SATD: +18% speed on K8(64bit), +22% on K10(32bit), +42% on Penryn(64bit), +44% on Nehalem(64bit), +50% on P4(32bit), +98% on Conroe(64bit)
Similar performance boosts in SATD-like functions (SA8D, hadamard_ac) and somewhat less in DCT/IDCT/SSD.
Overall performance boost is up to ~15% on 64-bit Conroe.

r1121
Update x264 copyright date

r1120
Remove pre-scenecut from fprofile commands as well
Also add psy-trellis to fprofile

r1119
Slightly faster 8x16 SAD on Penryn Core 2
Same as MMX 8x16 cacheline SAD, but calls SSE2 8x16 SAD in non-cacheline case.
Only Nehalem benefits from sizes smaller than 8x16, and Nehalem doesn't use cacheline functions, so no smaller versions are included.

r1118
Fix scenecut and VBV with videos of width/height <= 32
Also remove an unused variable

r1117
Remove non-pre scenecut
Add support for no-b-adapt + pre-scenecut (patch by BugMaster)
Pre-scenecut was generally better than regular scenecut in terms of accuracy and regular scenecut didn't work in threaded mode anyways.
Add no-scenecut option (scenecut=0 is now no scenecut; previously it was -1)
Fix an incorrect bias towards P-frames near scenecuts with B-adapt 2.
Simplify pre-scenecut code.

r1116
Add AltiVec version of hadamard_ac. 2.4x faster than the C version.
Note this this implementation is pretty naive and should be improved
by implementing what's discussed in this ML thread:
date: Mon, Feb 2, 2009 at 6:58 PM
subject: Re: [x264-devel] [PATCH] AltiVec implementation of hadamard_ac routines

r1115
Fix regression in r1085
Deblocking was very slightly incorrect with partitions=all.
Bug found by BugMaster.

r1114
Optimize neighbor CBP calculation and fix related regression
r1105 introduced array overflow in cbp handling

r1113
Show FPS when importing a raw YUV file

r1112
Windows 64-bit support
A "make distclean" is probably required after updating to this revision.

r1111
Minor fixes and cosmetics
Suppress a GCC warning, fix a non-problematic array overflow, one REP->REP_RET.

r1110
fix 10l in 75b495f2723fcb77f
Original thread:
date: Mon, Feb 9, 2009 at 9:37 PM
commit: Spare a vec_perm and a vec_mergeh though using a LUT of permutation vectors . (Guillaume Poirier )
:Mon Feb 9 21:17:33 2009 +0100
Spare a vec_perm and a vec_mergeh though using a LUT of permutation vectors.

r1108
Promote chroma planes to 16 byte alignment.
This will allow simplifying vectors loads that can only load 16-bytes
aligned data (such as AltiVec).

r1107
Fix 10L in intra pred
Forgetting a %define resulted in SIGILL on 32-bit systems without SSE (e.g. Athlon XP).

r1106
Add decimation in i16x16 blocks
Up to +0.04db with CAVLC, generally a lot less with CABAC.

r1105
Much faster CABAC residual context selection
Up to ~17% faster CABAC RDO, ~36% faster intra-only CABAC RDO.
Up to 7% faster overall in extreme cases.

r1104
Faster coeff_last64 on 32-bit

r1103
More intra pred asm optimizations
SSSE3 version of predict_8x8_hu
SSE2 version of predict_8x8c_p
SSSE3 versions of both planar prediction functions
Optimizations to predict_16x16_p_sse2
Some unnecessary REP_RETs -> RETs.
SSE2 version of predict_8x8_vr by Holger.
SSE2 version of predict_8x8_hd.
Don't compile MMX versions of some of the pred functions on x86_64.
Remove now-useless x86_64 C versions of 4x4 pred functions.
Rewrite some of the x86_64-only C functions in asm.

r1102
Speed-up mc_chroma_altivec by using vec_mladd cleverly, and unrolling.
Also put width == 2 variant in its own scalar function because it's faster
than a vectorized one.

r1101
Merging Holger's GSOC branch part 2: intra prediction
Assembly versions of most remaining 4x4 and 8x8 intra pred functions.
Assembly version of predict_8x8_filter.
A few other optimizations.
Primarily Core 2-optimized.

r1100
10l: fix compilation with GCC 4.3+

r1099
Faster 8x8dct+CAVLC interleave
Integrate array_non_zero with the CAVLC 8x8dct interleave function.
Roughly 1.5-2x faster than the original separate array_non_zero method.

r1098
Measure CBP cost in i8x8 RD refinement
~0.02-0.05db PSNR gain at high quants in intra-only encoding, pretty small otherwise.
Allows a small optimization in i8x8 encoding.

r1097
Take advantage of saturated signed horizontal sum instructions in
the variance computation epilogue since there won't be any overflow
triggering an overflow.
Suggested by Loren Merritt

r1096
Massive overhaul of nnz/cbp calculation
Modify quantization to also calculate array_non_zero.
PPC assembly changes by gpoirior.
New quant asm includes some small tweaks to quant and SSE4 versions using ptest for the array_non_zero.
Use this new feature of quant to merge nnz/cbp calculation directly with encoding and avoid many unnecessary calls to dequant/zigzag/decimate/etc.
Also add new i16x16 DC-only iDCT with asm.
Since intra encoding now directly calculates nnz, skip_intra now backs up nnz/cbp as well.
Output should be equivalent except when using p4x4+RDO because of a subtlety involving old nnz values lying around.
Performance increase in macroblock_encode: ~18% with dct-decimate, 30% without at CRF 25.
Overall performance increase 0-6% depending on encoding settings.

r1095
Add PowerPC support for "checkasm --bench", reading the time base register.
This isn't ideal since the `time base' register is running at a fraction
of the processor cycle speed, so the measurement isn't as precise as x86's
rdtsc.
It's better than nothing though...

r1094
fix detection of pthread and isfinite on OpenBSD

r1093
remove $ECHON kludge, which broke on SunOS. bring back `gcc -MT`.
remove auto-reconfigure on svn update, which has done nothing since we stopped using svn.
fix $AS on sparc (was disabled by mmx check).
fix --extra-asflags (was ignored).
mark bash scripts as bash, not sh
patch partly by Greg Robinson and Jugdish.

r1092
1.6x faster satd_c (and sa8d and hadamard_ac) with pseudo-simd.
60KB smaller binary.

r1091
Hack around a potential failure point in VBV
pred_b_from_p can become absurdly large in static scenes, leading to rare collapses of quality with VBV+B-frames+threads.
This isn't a final fix, but should resolve the problem in most cases in the meantime.

r1090
Much faster chroma encoding and other opts
~15% faster chroma encode by reorganizing CBP calculation and adding special-case idct_dc function, since most coded chroma blocks are DC-only.
Small optimization in cache_save (skip_bp)
Fix array_non_zero to not violate strict aliasing (should eliminate miscompilation issues in the future)
Add in automatic substitutions for some asm instructions that have an equivalent smaller representation.

r1089
add AltiVec implementation of x264_mc_copy_w16_aligned

r1088
add AltiVec implementation of x264_pixel_var_16x16 and x264_pixel_var_8x8

r1087
add AltiVec 16 <-> 32 bits conversions macros

r1086
Replace 16x16=>32 mul + pack + add by a simple 16x16=>16 multiply-add.
Suggested by Loren.

r1085
Eliminate support for direct_8x8_inference=0
The benefit in the most extreme contrived situation was at most 0.001db PSNR, at the cost of slower decoding.
As this option was basically useless, it was a waste of code and prevented some other useful optimizations.
Remove some unused mc code related to sub-8x8 partitions.
Small deblocking speedup when p4x4 is used.
Also remove unused x264_nal_decode prototype from x264.h.

r1084
Add AltiVec and CPU numbers detection on OpenBSD.

r1083
Add AltiVec implementation of predict_8x8c_p. 2.6x faster than scalar C.

r1082
Warn if direct auto wasn't set on the first pass
And, if it wasn't, run direct auto as if it was the first pass, rather than simply forcing temporal direct mode on all frames.
Also a small tweak to coeff_level_run asm.

r1081
Changes the PowerPC ppccommon.h header so it no longer checks for a particular
OS such as Linux but instead looks for HAVE_ALTIVEC_H being set.
Fixes all *BSD/PowerPC builds.

r1080
update x264_hpel_filter_altivec's prototype to match the one of the C version.
in commit 045ae4045a1827555b3eaab4fbf3c9809e98c58f (factorization of mallocs)
or: Guillaume Poirier <gpoirier@mplayerhq.hu>
Date:Wed Jan 14 21:49:42 2009 +0100
rename vector+array unions to closer match the vector typedefs names.

r1078
Add Altivec implementation of all the remaining 16x16 predict routines.

r1077
Cache ref costs and use more accurate MV costs
New MV costs should improve quality slightly by improving the smoothness of the field of MV costs (and they're closer to CABAC's actual costs).
Despite being optimized for CABAC, they still help under CAVLC, albeit less.
MV cost change by Loren Merritt

r1076
Support forced frametypes with scenecut/b-adapt
This allows an input qpfile to be used to force I-frames, for example.
The same can be done through the library interface.
Document the format of the qpfile in --longhelp and the forcing of frametypes in x264.h
Note that forcing B-frames and B-refs may not always have the intended result.
Patch partially by Steven Walters <kemuri9@gmail.com>.

r1075
Remove an IDIV from i8x8 analysis
Only one IDIV is left in macroblock level code (transform_rd)

r1074
Fix regression in r1066
With some combinations of video width and other settings, the scratch buffer was slightly too small.
This caused heap corruption on some systems.
Also prevent merange from being raised during encoding with esa/tesa through encoder_reconfig, as this no longer works.

r1073
Disable B-frames in lossless mode
They hurt compression anyways, and direct auto was bugged with lossless.

r1072
Factorize in ppccommon.h the conditional inclusion of altivec.h on Linux systems.

r1071
Disable __builtin_clz() intrinsic on gcc versions prior to 3.4.
The function did not exist before that version.

r1070
Small tweaks to coeff asm
Factor out a few redundant pxors
Related cosmetics

r1069
Use the correct strtok under MSVC
Also change one malloc -> x264_malloc

r1068
Add stack alignment for lookahead functions
Should allow libx264 to be called from non-gcc-compiled applications without adding force_align_arg_pointer.

r1067
Add support for SSE4a (Phenom) LZCNT instruction
Significantly speeds up coeff_last and coeff_level_run on Phenom CPUs for faster CAVLC and CABAC.
Also a small tweak to coeff_level_run asm.

r1066
factor mallocs out of hpel, ssim, and esa.
there should now be no memory allocation outside of init-time.

r1065
Much faster CAVLC RDO and bitstream writing
Pure asm version of level/run coding.Over 2x faster than C.
Up to 40% faster CAVLC RDO.Overall benefit up to ~7.5% with RDO or ~5% with fast encoding settings.

r1064
Cosmetics: cleaner syntax for defining temporary registers in asm
Globally define t#[qdwb], so that only t# needs to be locally defined when reorganizing registers

r1063
Much faster CABAC RDO
Since RDO doesn't care about what order bit costs are calculated, merge sigmap and level coding into the same loop in RDO.
This is bit-exact for 4x4dct but slightly incorrect for 8x8dct due to the sigmap containing duplicated contexts.
However, the PSNR penalty of this is extremely small (~0.001db).
Speed benefit is about 15% in 4x4dct and 30% in 8x8dct residual bit cost calculation at QP20.
Overall encoding speed benefit is up to 5%, depending on encoding settings.
Also remove an old unnecessary CABAC table that hasn't been used for years.

r1062
VLC table optimizations
Slightly reorganize VLC tables for ~2% faster block_residual_write_cavlc.
Also a small optimization in p8x8 CAVLC.

r1061
Fix crash in --me esa/tesa introduced in r1058
Also suppress the last mingw warning message

r1060
Optimize variance asm + minor changes
Remove SAD argument from var, not needed anymore.
Speed up var asm a bit by eliminating psadbw and instead HADDWing at end.
Eliminate all remaining warnings on gcc 3.4 on cygwin
Port another minor optimization from lavc (pskip)

r1059
Minor CABAC cleanups and related optimizations
Merge the two list tables to allow cleaner MC/CABAC/CAVLC code
Remove lots of unnecessary {s
Port some very minor opts from lavc

r1058
faster ESA init
reduce memory if using ESA and not p4x4

r1057
More macroblock_cache optimizations
Patch partially by Loren Merritt

r1056
Faster macroblock_cache_rect
Explicit loop unrolling

r1055
Optimizations in predict_mv_direct
Add some early terminations and minor optimizations
This change may also fix the extremely rare direct+threading MV bug.

r1054
Fix visual corruption when picture width was not mod 32.
The previous Altivec implemention of mc_chroma assumed that i_src_stride was always mod 16.

r1053
Add support for FSF GCC version >= 4.3 on OSX.
So far, only Apple GCC version was supported.

r1052
More accurate refcost for p8x8 CAVLC
Slightly better quality, especially in non-RD mode, with CAVLC.

r1051
use lookup tables instead of actual exp/pow for AQ
Significant speed boost, especially on CPUs with atrociously slow floating point units (e.g. Pentium 4 saves 800 clocks per MB with this change).
Add x264_clz function as part of the LUT system: this may be useful later.
Note this changes output somewhat as the numbers from the lookup table are not exact.

r1050
Suppress saveptr warnings on Windows GCC

r1049
More small speed tweaks to macroblock.c

r1048
Much faster CAVLC residual coding
Use a VLC table for common levelcodes instead of constructing them on-the-spot
Branchless version of i_trailing calculation (2x faster on Nehalem)
Completely remove array_non_zero_count and instead use the count calculated in level/run coding.Note: this slightly changes output with subme > 7 due to different nonzero counts being stored during qpel RD.

r1047
fix compilation with GCC-4.3+

r1046
High Profile allows 25% higher maxbitrate/cpb
Correct level detection to take this into account.

r1045
s/nasm/yasm in VS project file

r1044
Cosmetic: update various file headers.

r1043
add date and compiler to `x264 --version`

r1042
10L in r1041

r1041
Significantly faster CABAC and CAVLC residual coding and bit cost calculation
Early-terminate in residual writing using stored nnz counts
To allow the above, store nnz counts for luma and chroma DC
Add assembly functions to find the last nonzero coefficient in a block
Overall ~1.9% faster at subme9+8x8dct+qp25 with CAVLC, ~0.7% faster with CABAC
Note this changes output slightly with CABAC RDO because it requires always storing correct nnz values during RDO, which wasn't done before in cases it wasn't useful.
CAVLC output should be equivalent.

r1040
dequant_4x4_dc assembly
About 3.5x faster DC dequant on Conroe

r1039
fix an overflow in dct4x4dc_mmx
(unlikely to have occurred in any real video)

r1038
Remove nasm support
Nasm won't correctly parse the SSE4 code introduced a few revisions ago, so we're removing support.
Users should upgrade to yasm 0.6.1 or later.

r1037
Fix rare warning messages in ratecontrol due to r1020

r1036
Fix MSVC compilation and clean up MSVC build file
Remove Release64 which never worked anyways.

r1035
Faster width4 SSD+SATD, SSE4 optimizations
Do satd 4x8 by transposing the two blocks' positions and running satd 8x4.
Use pinsrd (SSE4) for faster width4 SSD
Globally replace movlhps with punpcklqdq (it seems to be faster on Conroe)
Move mask_misalign declaration to cpu.h to avoid warning in encoder.c.
These optimizations help on Nehalem, Phenom, and Penryn CPUs.

r1034
fix indentation, whitespace cleanup, more consistent indentation of macro backslashes

r1033
Change some macros to be more sensitive to memory alignment, thus avoiding
useless loads/stores and calculations of permutation vectors.
Affected functions are all of mc_luma, mc_chroma, 'get_ref', SATD, SA8D and deblock.
Gains globally vary from ~5% - 15% on a depending on settings running on a 1.42 ghz G4.

r1032
refactor satd. 20KB smaller binary.
refactor sa8d. slightly faster.
more checkasm for hadamard.

r1031
Fix crash with threads and SSEMisalign on Phenom
Misalign mask needed to be set separately for each encoding thread.

r1030
Phenom CPU optimizations
Faster hpel_filter by using unaligned loads instead of emulated PALIGNR
Faster hpel_filter on 64-bit by using the 32-bit version (the cost of emulated PALIGNR is high enough that the savings from caching intermediate values is not worth it).
Add support for misaligned_mask on Phenom: ~2% faster hpel_filter, ~4% faster width16 multisad, 7% faster width20 get_ref.
Replace width12 mmx with width16 sse on Phenom and Nehalem: 32% faster width12 get_ref on Phenom.
Merge cpu-32.asm and cpu-64.asm
Thanks to Easy123 for contributing a Phenom box for a weekend so I could write these optimizations.

r1029
A few tweaks to decimate asm
A little bit faster on both 32-bit and 64-bit

r1028
Nehalem optimization part 2: SSE2 width-8 SAD
Helps a bit on Phenom as well
~25% faster width8 multiSAD on Nehalem

r1027
Add subme=0 (fullpel motion estimation only)
Only for experimental purposes and ultra-fast encoding.Probably not a good idea for firstpass.

r1026
Fix minor memory leak in r1022

r1025
r1024 borked checkasm
Remove idct/dct2x2 from checkasm as they are no longer in dctf

r1024
Faster chroma encoding
9-12% faster chroma encode.
Move all functions for handling chroma DC that don't have assembly versions to macroblock.c and inline them, along with a few other tweaks.

r1023
Various cosmetics and minor fixes
Disable hadamard_ac sse2/ssse3 under stack_mod4
Fix one MSVC compilation warning
Fix compilation in debug mode in certain cases on x64
Remove eval.c from MSVC project
Fix crash when VBV is used in CQP mode
Patches by MasterNobody

r1022
Faster b-adapt + adaptive quantization
Factor out pow to be only called once per macroblock.Speeds up b-adapt, especially b-adapt 2, considerably.
Speed boost is as high as 24% with b-adapt 2 + b-frames 16.

r1021
Faster CABAC residual encoding
6% faster block_residual_write_cabac in RD mode.

r1020
Fix potential crash in the case that the input statsfile is too short
Also resolve various other potential weirdness (such as multiple copies of the same error message in threaded mode).

r1019
Initial Nehalem CPU optimizations
movaps/movups are no longer equivalent to their integer equivalents on the Nehalem, so that substitution is removed.
Nehalem has a much lower cacheline split penalty than previous Intel CPUs, so cacheline workarounds are no longer necessary.
Intel for providing Avail Media with the pre-release Nehalem CPU needed to prepare these (and other not-yet-committed) optimizations.
or: Gabriel Bouvigne <bouvigne@mp3-tech.org>
Date:Tue Nov 4 09:56:03 2008 -0800
Fix potential infinite loop in VBV under GCC 4.2

r1017
Encoder_reconfig: esa/tesa can only be enabled if they were on to begin with
Bug report by kemuri-_9.

r1016
Fix bug in hadamard_ac SSE assembly
Some extreme inputs could cause overflows.

r1015
Full sub8x8 RD mode decision
Small speed penalty with p4x4 enabled, but significant quality gain at subme >= 6
As before, gain is proportional to the amount of p4x4 actually useful in a given input at the given bitrate.

r1014
Optimize CABAC bit cost calculation
Speed up cabac mvd and add new precalculated transition/entropy table.
Add "noup" function for cabac operations to not update the state table when it isn't necessary.
1-3% faster macroblock_size_cabac.
Cosmetics

r1013
Replace "git-command" with "git command" in version.sh for git 1.6 support

r1012
Add assembly version of CAVLC 8x8dct interleave
Faster CAVLC encoding and RDO with 8x8dct

r1011
Add support for psy-rd/trellis to encoder_reconfig

r1010
Fix Darwin speed regression

r1009
Further improve prediction of bitrate and VBV in threaded mode

r1008
Sub-8x8 Qpel-RD in P-frames
Improves quality when using p8x4/p4x8/p4x4 subpartitions
Benefit is proportional to how many sub-8x8 partitions are used; helps most at high bitrates and low resolutions.

r1007
Faster qpel-RD
3-4% faster qpel-RD; avoid re-checking bmv/pmv during the hex search.

r1006
Some minor optimizations in RD refinement
Don't write b subpartition in CABAC RDO
Calculate nonzero count in i4x4 CAVLC RDO

r1005
Faster deblocking when p4x4 isn't used
Most of the MV checks can be skipped, resulting in faster strength calculation

r1004
Print profile and level information upon starting encode
Previously level was only printed as part of autodetect, and only in verbose mode.

r1003
Fix possible crash in trellis at very low QPs

r1002
Add assembly versions of decimate_score
3-7x faster decimation, 1-3% faster overall

r1001
Fix typo in subme8/9 lossless qpel-RD
Slightly improves compression.

r1000
Extend trellis to support luma/chroma DC and chroma AC
Small speed loss in trellis 1, slightly larger in trellis 2, but significant quality improvement.

r999
rm gtk, avc2avi.
I don't remember why I allowed a gui into the repository in the first place. There's nothing that makes this one special relative to all the other x264 guis.
avc2avi doesn't compile since we removed the bitstream reader. And avc doesn't belong in avi.

r998
Resolve quality regression in r996
Accidentally removed the wrong line of code.I think this classifies as a "10l".
Thanks to techouse for initial bug report and skystrife for helping me find it.

r997
Fix minor memory leak accidentally added with the addition of b-adapt 2

r996
Rework subme system, add RD refinement in B-frames
The new system is as follows: subme6 is RD in I/P frames, subme7 is RD in all frames, subme8 is RD refinement in I/P frames, and subme9 is RD refinement in all frames.
subme6 == old subme6, subme7 == old subme6+brdo, subme8 == old subme7+brdo, subme9 == no equivalent
--b-rdo has, accordingly, been removed.--bime has also been removed, and instead enabled automatically at subme >= 5.
RD refinement in B-frames (subme9) includes both qpel-RD and an RD version of bime.

r995
Fix potential miscompilation of some inline asm
Caused problems under some gcc 4.x versions with predictive lossless

r994
Replace High 4:4:4 profile lossless with High 4:4:4 Predictive.
This improves lossless compression by about 4-25% depending on source.
The benefit is generally higher for intra-only compression.
Also add support for 8x8dct and i8x8 blocks in lossless mode; this improves compression very slightly.
In some rare cases 8x8dct can hurt compression in lossless mode, but its usually helpful, albeit marginally.
Note that 8x8dct is only available with CABAC as it is never useful with CAVLC.
High 4:4:4 Predictive replaced the previous profile in a 2007 revision to the H.264 standard.
The only known compliant decoder for this profile is the latest version of CoreAVC.
As I write this, JM does not actually correctly decode this profile.
lack of support will soon change with this commit, as x264 will be (to my knowledge) the first compliant encoder.
:Fri Sep 26 09:19:56 2008 -0700
Fix typo in progress indicator when using piped input

r992
avg_weight_ssse3

r991
fix bitstream writer on bigendian 64bit (regression in r903)

r990
remove authors whose code no longer exists

r989
more diagnostics when configure finds an unsuitable assembler

r988
Make x264 progress indicator more concise
Now the % indicator should be readable on the header of a minimized window on Windows systems.

r987
Fix deblocking + threads + AQ bug
At low QPs, with threads and deblocking on, deblocking could be improperly disabled.
Revision in which this bug was introduced is unknown; it may be as old as b_variable_qp in x264 itself.

r986
Resolve possible crash in bime, improve the fix in r985

r985
Fix rare crash issue in b-adapt
Regression *probably* in r979

r984
Merging Holger's GSOC branch part 1: hpel_filter speedups

r983
r980 borked weighted bime

r982
Disable I_PCM with psy-RD
psy-RD seems to put the PCM threshold a bit lower than it should be, so PCM is now disabled under psy-RD.

r981
Merge avg and avg_weight
avg_weight no longer has to be special-cased in the code; faster weightb

r980
Rewrite avg/avg_weight to take two source pointers
This allows the use of get_ref instead of mc_luma almost everywhere for bipred

r979
Use low-resolution lookahead motion vectors as an extra predictor
Improves quality considerably (0-5%) in 1pass/CRF mode, especially with lower --me values and complex motion.
Reverses the order of lowres lookahead search to improve the usefulness of the extra predictors.

r978
Add missing free() for f_qp_offset in frame.c

r977
Correct misprediction of bitrate in threaded mode
Improves bitrate accuracy in cases with large numbers of threads.
Loosely based on a patch by BugMaster.

r976
Fix a case in which VBV underflows can occur
Fix a potential case where a frame might be initially allocated too low a QP, which would then have to be raised a low during row-based ratecontrol.
In some cases, this could even produce VBV underflows in 2pass mode.

r975
Use correct format specifier for uint64_t

r974
Cache motion vectors in lowres lookahead
This vastly speeds up b-adapt 2, especially at large bframes values.
This changes output because now MV prediction in lookahead only uses L0/L1 MVs, not bidir.This isn't a problem, since the bidir prediction wasn't really correct to begin with, so the change in output is neither positive nor negative.
This also allowed the removal of some unnecessary memsets, which should also give a small speed boost.
Finally, this allows the use of the lowres motion vectors for predictors in some future patch.

r973
Fix regression in b-adapt patch: encoder_open failed for multipass encodes without bframes.

r972
Stop SAR in y4m input from overriding --sar on commandline

r971
hadamard_ac for psy-rd
c version is 1.7x faster than satd+sa8d+sad
ssse3 version is 2.3x faster than satd+sa8d+sad

r970
Psychovisually optimized rate-distortion optimization and trellis
The latter, psy-trellis, is disabled by default and is reserved as experimental; your mileage may vary.
Default subme is raised to 6 so that psy RD is on by default.

r969
Add optional more optimal B-frame decision method
This method (--b-adapt 2) uses a Viterbi algorithm somewhat similar to that used in trellis quantization.
Note that it is not fully optimized and is very slow with large --bframes values.
It also takes into account weightb, which should improve fade detection.
Additionally, changes were made to cache lowres intra results for each frame to avoid recalculating them.This should improve performance in both B-frame decision methods.
This can also be done for motion vectors, which will dramatically improve b-adapt 2 performance when it is complete.
This patch also reads b_adapt and scenecut settings from the first pass so that the x264 header information in the output file will have correct information (since frametype decision is only done on the first pass).

r968
Move adaptive quantization to before ratecontrol, eliminate qcomp bias
This change improves VBV accuracy and improves bit distribution in CRF and 2pass.
Instead of being applied after ratecontrol, AQ becomes part of the complexity measure that ratecontrol uses.
This allows for modularity for changes to AQ; a new AQ algorithm can be introduced simply by introducing a new aq_mode and a corresponding if in adaptive_quant_frame.
This also allows quantizer field smoothing, since quantizers are calculated beofrehand rather during encoding.
Since there is no more reason for it, aq_mode 1 is removed.The new mode 1 is in a sense a merger of the old modes 1 and 2.
WARNING: This change redefines CRF when using AQ, so output bitrate for a given CRF may be significantly different from before this change!

r967
Fix crash when using b-adapt at resolutions 32x32 or below.
Original patch by BugMaster, but was mostly rewritten in order to make b-adapt actually *work* at such resolutions, not merely stop crashing.

r966
Add title-bar progress indicator under WIN32
Also add bitrate-so-far output when piping data to x264 (total frames not known)
Patch mostly by recover from Doom9.

r965
Revert part of r963
In some rare (but significant) cases, the optimized nal_encode algorithm gave incorrect results.

r964
Predict 4x4_DC asm
Also remove 5-year-old unnecessary #define that reduced speed unnecessarily under MSVC-compiled builds

r963
Faster NAL unit encoding and remove unused nal_decode
Small speedup at very high bitrates

r962
CAVLC cleanup and optimizations
Also move some small functions in macroblock.c to a .h file so they can be inlined.

r961
Faster avg_weight assembly
Unrolling the loop a bit improves performance

r960
Faster H asm intra prediction functions
Take advantage of the H prediction method invented for merged intra SAD and apply it to regular prediction, too.

r959
Add merged SAD for i16x16 analysis
Roughly 30% faster i16x16 analysis under subme=1

r958
Add sad_aligned for faster subme=1 mbcmp
Distinguish between unaligned and aligned uses of mbcmp
SAD_aligned, for MMX SADs, uses non-cacheline SADs.

r957
Improve progress indicator
Show average bitrate so far during encoding
Decrease update interval for longer encodes (max of 10 frames encoded between updates)

r956
Fix speed regression in r951
Row SATDs are only necessary in VBV mode, so don't need to be checked if VBV is off.

r955
zigzag asm

r954
fix SOFLAGS used when building gtk frontend
patch by Markus Kanet %darkvision A gmx P eu%

r953
remove the distinction between itex and ptex
(changes 2pass statsfile format)

r952
hardcode the ratecontrol equation, and remove the rceq option

r951
Fix some uses of uninitialized row_satd values in VBV
Resolves some issues with QP51 in I-frames with scenecut

r950
Activate trellis in p8x8 qpel RD
Also clean up macroblock.c with some refactoring
Note that this change significantly reduces subme7+trellis2 performance, but improves quality.
Issue originally reported by Alex_W.

r949
Improve VBV accuracy
Don't use the previous frame's row SATD as a predictor if it is too different from this frame's row SATD.

r948
improve generation of Darwin libraries
Patch by vmrsss %vmrsss A gmail P com%

r947
Fix compilation in gcc 3.4.x (issue in r946)
Due to a bug in gcc 3.4.x, in certain cases of inlining, the array_non_zero_int_mmx inline asssembly is miscompiled and causes a crash with --subme 7 --8x8dct.
This minor hack fixes this issue.

r946
shut up various gcc warnings

r945
fix a crash with invalid args and --thread-input (introduced in r921)

r944
drop support for x86_32 PIC.

r943
use permute macros in satd
move some more shared macros to x264util.asm

r942
cosmetics

r941
r940 broke threads

r940
Cleanups in macroblock_cache_save/load
A bit more loop unrolling, and moving some constant code to the global init function

r939
Deblocking code cleanup and cosmetics
Convert the style of the deblocking code to the standard x264 style
Eliminate some trailing whitespace

r938
4% faster deblock: special-case macroblock edges
Along with a bit of related code reorganization and macroification

r937
Add dedicated variance function instead of using SAD+SSD
Faster variance calculation

r936
6% faster deblock: remove some clips, earlier termiantion on low qps.

r935
Faster deblocking
Early termination for bS=0, alpha=0, beta=0
Refactoring, various other optimizations
About 30% faster deblocking overall.

r934
asm cosmetics

r933
yet another posix-emulating define on solaris

r932
update msvc projectfile

r931
drop support for msvc6

r930
Prevent VBV from lowering quantizer too much
This code seemed to act up unexpectedly sometimes, creating a situation where in 1-pass VBV mode, a frame's quantizer would drop all the way to qpmin and then shoot back upwards to qpmax, causing serious visual issues.
This change may decrease bitrate in VBV mode, but that is preferable to the artifacting produced by this code.

r929
Improve subme7 at low QPs and add subme7 support in lossless mode

r928
cosmetics: merge x86inc*.asm

r927
Add missing x264util.asm

r926
Basic sanity checking of qpmax/qpmin options

r925
Fix regression in r922
set the chroma DC coefficients to zero for residual coding in qpel-rd
fix C99ism

r924
Refactor asm macros part 2: DCT

r923
Refactor asm macros part 1: DCT

r922
Improve intra RD refine, speed up residual_write_cabac
a do/while loop can be used for residual_write, but i8x8 had to be fixed so that it wouldn't call residual_write with zero coeffs
proper nnz handling added to cabac intra rd refine
chroma cbp added to 8x8 chroma rd
cbp was tested, but wasn't useful

r921
Fix a few more minor memleaks

r920
stats summary: print distribution of numbers of consecutive B-frames

r919
add interlacing to the list of stuff checked by x264_validate_levels

r918
Fix C99-ism in r907

r917
Faster temporal predictor calculation
a separate commit because this changes rounding, and thus changes output slightly.
:Thu Jul 17 07:55:24 2008 -0600
Align lowres planes for improved cacheline split performance

r915
autodetect level based on resolution/bitrate/refs/etc, rather than defaulting to L5.1
if vbv is not enabled (and especially in crf/cqp), we have to guess max bitrate, so we might underestimate the required level.

r914
fix bs_write_ue_big for values >= 0x10000.
(no immediate effect, since nothing writes such values yet)

r913
Fix lossless mode borked in r901

r912
Relax QPfile restrictions
Allow a QPfile to contain fewer frames than the total number of frames in the video and have ratecontrol fill in the rest.
Patch by kemuri9.

r911
Limit MVrange correctly in interlaced mode
Bug report by Sigma Designs, Inc.

r910
Fix bug with PCM and adaptive quantization
In rare cases CABAC desync could occur, causing bitstream corruption

r909
Fix memory leak upon x264 closing
Doesn't affect the CLI, but potentially important for programs which call x264 as a shared library.

r908
Fix compilation on PPC systems (borked in r903)
Bigendian systems didn't have endian_fix32 defined

r907
Add L1 reflist and B macroblock types to x264 info
Also remove display of "PCM" if PCM mode is never used in the encode.
L1 reflist information will only show if pyramid coding is used.

r906
Fix and enable I_PCM macroblock support
In RD mode, always consider PCM as a macroblock mode possibility
Fix bitstream writing for PCM blocks in CAVLC and CABAC, and a few other minor changes to make PCM work.
PCM macroblocks improve compression at very low QPs (1-5) and in lossless mode.

r905
de-duplicate vlc tables

r904
faster ue/se/te write

r903
faster bs_write

r902
cosmetics in ssd asm

r901
Various optimizations and cosmetics
Update AUTHORS file with Gabriel and me
update XCHG macro to work correctly in if statements
Add new lookup tables for block_idx and fdec/fenc addresses
Slightly faster array_non_zero_count_mmx (patch by holger)
Eliminate branch in analyse_intra
Unroll loops in and clean up chroma encode
Convert some for loops to do/while loops for speed improvement
Do explicit write-combining on --me tesa mvsad_t struct
Shrink --me esa zero[] array
Speed up bime by reducing size of visited[][][] array

r900
Resolve floating point exception with frame_init_lowres mmx
In some cases, the mmx version of frame_init_lowres could leave the FPU uninitialized for use in ratecontrol, resulting in floating point exceptions.
Since frame_init_lowres is such a time-consuming function, an emms was just put at the end, since it costs almost nothing compared to the total time of frame_init_lowres.

r899
Update my email address

r898
Update file headers throughout x264
Update "Authors" lists based on actual authorship; highest is most important
Update copyright notices and remove old CVS tags from file headers
Add file headers to GTK and other sections missing them
Update FSF address
Other header-related cosmetics

r897
denoise_dct asm

r896
cosmetics in permutation macros
SWAP can now take mmregs directly, rather than just their numbers

r895
Fix bug in adaptive quantization
In some cases adaptive quantization did not correctly calculate the variance.
Bug reported by MasterNobody

r894
lowres_init asm
rounding is changed for asm convenience. this makes the c version slower, but there's no way around that if all the implementations are to have the same results.

r893
Optimizations and cosmetics in macroblock.c
If an i4x4 dct block has no coefficients, don't bother with dequant/zigzag/idct.Not useful for larger sizes because the odds of an empty block are much lower.
Cosmetics in i16x16 to be more consistent with other similar functions.
Add an SSD threshold for chroma in probe_skip to improve speed and minimize time spent on chroma skip analysis.
Rename lambda arrays to lambda_tab for consistency.

r892
some asm functions require aligned stack. disable these when compiling with msvc/icc.

r891
Move bitstream end check to macroblock level
Additionally, instead of silently truncating the frame upon reaching the end of the buffer, reallocate a larger buffer instead.

r890
Convert NNZ to raster order and other optimizations
Converting NNZ to raster order simplifies a lot of the load/store code and allows more use of write-combining.
More use of write-combining throughout load/save code in common/macroblock.c
GCC has aliasing issues in the case of stores to 8-bit heap-allocated arrays; dereferencing the pointer once avoids this problem and significantly increases performance.
More manual loop unrolling and such.
Move all packXtoY functions to macroblock.h so any function can use them.
Add pack8to32.
Minor optimizations to encoder/macroblock.c

r889
mc_chroma_sse2/ssse3

r888
checkasm --bench=function_name

r887
interleave psnr/ssim computation with reference frame filtering, to improve cache coherency

r886
Add more inline asm and a runtime check for MMXEXT support
x264 will now terminate gracefully rather than SIGILL when run on a machine with no MMXEXT support.
A configure option is now available to build x264 without assembly support for support on such old CPUs as the Pentium 2, K6, etc.

r885
Use aligned memcpy for x264_me_t struct and cosmetics

r884
Cosmetics and loop unrolling
GCC is not very good at loop unrolling in cases where it can perform constant propagation, so the unrolling unfortunately has to be done manually.

r883
Fix regression in 64-bit in r882
i_mvc needs to be 64-bit when used with a 64-bit memory pointer

r882
More tweaks to me.c
Added inline MMX version of UMH's predictor difference test
Various cosmetics throughout me.c
Removed a C99-ism introduced in r878.

r881
Fix regression in r736
r736 added intra RD refinement to B-frames; however, it is possible for subme=7 to be used without b-rdo.
This means intra RD isn't run, and therefore it is possible for intra chroma analysis to not have been run, since update_cache was never called for an intra block, and chroma ME is not required even at subme=7.
r801, which removed a memset, made this worse because previously the chroma prediction mode was at least initialized to zero; now it was not initialized at all.
Therefore, --no-chroma-me, --subme 7, and no --b-rdo had the potential to crash.
This change restricts intra RD refinement to only be run when --b-rdo is enabled (sensible to begin with), thus preventing a crash in this case.

r880
Fix regression in r850
Bug resulted in rare incorrect chroma encoding

r879
Cosmetics in VBV handling

r878
Tweaks and cosmetics in me.c
Use write-combining for predictor checking and other tweaks.

r877
Partially inline trellis quantization
Inlining trellis into the 4x4/8x8 trellis wrappers increases trellis speed by about 5-10% through constant propagation.

r876
Various cosmetic changes.

r875
avg_weight_sse2

r874
many changes to which asm functions are enabled on which cpus.
with Phenom, 3dnow is no longer equivalent to "sse2 is slow", so make a new flag for that.
some sse2 functions are useful only on Core2 and Phenom, so make a "sse2 is fast" flag for that.
some ssse3 instructions didn't become useful until Penryn, so yet another flag.
disable sse2 completely on Pentium M and Core1, because it's uniformly slower than mmx.
enable some sse2 functions on Athlon64 that always were faster and we just didn't notice.
remove mc_luma_sse3, because the only cpu that has lddqu (namely Pentium 4D) doesn't have "sse2 is fast".
don't print mmx1, sse1, nor 3dnow in the detected cpuflags, since we don't really have any such functions. likewise don't print sse3 unless it's used (Pentium 4D).

r873
enable ssse3 phadd satd on Penryn.

r872
benchmark most of the asm functions (checkasm --bench).

r871
Cosmetic: fix C99-ism

r870
Use a gaussian window for cplxblur
Cplxblur was originally intended to use a gaussian window, but in its current form did not.This change provides a tiny improvement to 2pass ratecontrol.

r869
cosmetics

r868
nasm compatible NX stack

r867
CQP is incompatible with AQ

r866
memzero_aligned_mmx

r865
binmode stdin on mingw, not just msvc

r864
omit redundant mc after non-rdo dct size decision, and in b-direct rdo

r863
allow fractional CRF values with AQ.

r862
fix some uninitialized partitions in rdo

r861
2-pass VBV support and improved VBV handling
Dramatically improves 1-pass VBV ratecontrol (especially CBR) and provides support for VBV in 2-pass mode.This consists of a series of functions that attempts to find overflows and underflows in the VBV from the first-pass statsfile and fix them before encoding.
1-pass VBV code partially by Dark Shikari.

r860
Fix noise reduction in threaded mode.
Previously enabling noise reduction with threads had no effect.
Note that this is not an optimal solution; each thread still tracks noise reducation separately (unlike in single-threaded mode).

r859
fix a crash on win32 with threads.
r852 introduced an assumption in deblock that the stack is aligned.

r858
remove nasm version check. a feature check is all that's needed.
silence stderr in yasm version check.

r857
cosmetics in cabac

r856
faster residual_write_cabac

r855
change DEBUG_DUMP_FRAME to run-time --dump-yuv

r854
x264_median_mv_mmxext
this is the first non-runtime-detected use of mmxext, but it has to be inlined

r853
factor duplicated code out of deblock chroma mmx

r852
deblock_luma_intra_mmx

r851
write aspect ratio in mp4

r850
omit delta_quant in i16x16 blocks with no residual
(all other block types were already covered, but i16x16 cbp is special)

r849
explicit write combining, because gcc fails at optimizing consecutive memory accesses

r848
force unroll macroblock_load_pic_pointers
and a few other minor optimizations

r847
quant_2x2_dc_ssse3

r846
r836 borked lossless cabac nnz

r845
use elf instead of a.out on netbsd

r844
fix x264_realloc when not using libc realloc.

r843
don't pretend to support win64. remove all related code.
it hasn't worked since probably some time in 2005, and won't ever be fixed unless someone steps up to maintain it.

r842
cosmetics: replace last instances of parm# asm macros with r#

r841
remove DEBUG_BENCHMARK

r840
faster probe_skip

r839
drop support for pre-SSE3 assemblers

r838
s/x264_cpu_restore/x264_emms/
no point in giving it a generic name when it's not generic

r837
faster cabac_mb_cbp_luma
ported from ffmpeg

r836
remove some redundant nnz counts
move some nnz counts from macroblock_encode to cavlc if cabac doesn't need them

r835
compute missing nnz count in subme7 cavlc

r834
remove a division in macroblock-level bookkeeping

r833
omit P/B-skip mc from macroblock_encode if the pixels haven't been overwritten since probe_skip

r832
earlier termination in SEA if mvcost exceeds residual

r831
remove void* arithmetic from r821

r830
Fix define of illegal function identifiers (as defined in section "7.1.3 Reserved identiers" of C99 spec)

r829
Fix define of illegal identifier (as defined in section "7.1.3 Reserved identiers" of C99 spec) "__UNUSED__", and use the one defined in common/osdep.h, i.e. "UNUSED"
based on a patch by Diego Biurrun

r828
more consistent include name (in line with other PPC includes)

r827
fix illegal identifiers in multiple inclusion guards
patch by Diego Biurrun % diego A biurrun P de %

r826
AQ now treats perfectly flat blocks as low energy, rather than retaining previous block's QP.
fixes occasional blocking in fades.

r825
checkasm cabac

r824
s/movdqa/movaps/g

r823
--asm to allow testing of different versions of asm without recompile

r822
copy left neighbor pixels directly from previous mb instead of main plane

r821
cacheline split workaround for mc_luma

r820
add "SECTION_RODATA" before "SECTION .text" to setup the fakegot label used in macho binaries.
This fixes compilation with --enable-pic
Requires Yasm 0.7.0 or newer
Patch by Dave Lee % davelee P com A gmail P com %

r819
more hpel fixes

r818
update msvc projectfile

r817
r810 borked hpel_filter_sse2 on unaligned buffers

r816
threads=auto on multicore now implies thread input, just like explicit thread numbers already did

r815
dct4 sse2

r814
faster x86_32 dct8

r813
macros to deal with macros that permute their arguments

r812
mmx cachesplit sad of non-square sizes checked height instead of width

r811
sfence after nontemporal stores

r810
simplify hpel filter asm (move control flow to C) and add sse2, ssse3 versions

r809
more mmx/xmm macros (mova, movu, movh)

r808
improve handling of cavlc dct coef overflows
support large coefs in high profile, and clip to allowed range in baseline/main

r807
fix shared libs on MacOSX
based on a patch by İsmail Dönmez

r806
typo in r803

r805
fix a crash on mp4 muxing with invalid params

r804
variance-based psy adaptive quantization
new options: --aq-mode --aq-strength
AQ is enabled by default

r803
fix naming of .dll on mingw

r802
don't distinguish between mingw and cygwin

r801
remove a memset

r800
typo. don't evaluate rd pskip when p16x16 found ref>0.

r799
r784 borked lossless dc zigzag

r798
fix an arithmetic overflow that disabled SEA threshold after finding a mv with SAD < mvcost.

r797
fix hpel_filter_altivec picked up by checkasm
Patch by Manuel %maaanuuu A gmx.net % and Noboru Asai % noboru P asai A gmail P com %

r796
faster residual

r795
nasm doesn't like align(nop) in structs

r794
reduce the size of some cabac arrays

r793
use cabac context transition table from trellis in normal residual coding too

r792
rearrange cabac struct to reduce code size

r791
higher precision RD lambda
improves quality at QP<=12.

r790
faster cabac_encode_ue_bypass

r789
cabac asm.
mostly because gcc refuses to use cmov.
28% faster than c on core2, 11% on k8, 6% on p4.

r788
cosmetics in cabac

r787
inline cabac_size_decision

r786
cosmetics in DECLARE_ALIGNED

r785
don't distinguish between luma4x4 and luma4x4ac

r784
faster lossless zigzag

r783
more alignment

r782
add tesa and lossless to fprofile

r781
cosmetics in residual_write

r780
remove unused bitstream reader

r779
cosmetics in quant asm

r778
special case dequant for flat matrix

r777
faster dequant

r776
simplify hpel_filter_c

r775
use x264_mc_copy_w16_sse2 in mc.copy, it was previously only in mc_luma

r774
new ssd_8x*_sse2
align ssd_16x*_sse2
unroll ssd_4x*_mmx

r773
update altivec zigzags

r772
r768 borked cavlc

r771
cosmetics in intra predict

r770
faster intra predict 8x8 hu/hd

r769
reduce zigzag arrays from int to int16_t

r768
reduce the size of some arrays

r767
skip intra pred+dct+quant in cases where it's redundant (analyse vs encode)
large speedup with trellis=2, small speedup with trellis=0 and/or subme>=6

r766
cosmetics in asm

r765
satd_4x4_ssse3

r764
get_ref_sse2

r763
continue instead of crash when the threading mv constraint is violated.
doesn't fix the underlying bug, but hopefully less annoying until we find it.

r762
remove remaining reference to clip1.h

r761
fix name mangling again.
apparently it's not just a convention, dll build fails if you try to export a non-prefixed name.

r760
update msvc projectfile

r759
missing #ifdef HAVE_SSE3

r758
don't define offsetof since it's standard

r757
shut up gcc warning in offsetof

r756
increase alignment of mv arrays

r755
memcpy_aligned_sse2

r754
checkasm check whether callee-saved regs are correctly saved
x86_32 only for now since x86_64 varargs are annoying

r753
fix x86_32 ads which failed to preserve a register

r752
fix some name mangling issues introduced by the merge

r751
remove x264_mc_clip1.
it's wrong for sufficiently perverse inputs, and clip_uint8 is faster anyway.

r750
merge x86_32 and x86_64 asm, with macros to abstract calling convention and register names

r749
git compatible version script

r748
check for broken versions of yasm

r747
increase the alignment of the i8x8 edge cache, needed for sse2 intra prediction.
patch by Alexander Strange.

r746
.gitignore

r745
pic macros now keep track of which register holds the GOT, so variable access doesn't have to care

r744
remove x86_64 predict_8x8_ddl_mmxext because sse2 is faster even on amd

r743
cosmetics in dsp init

r742
sse2 16x16 intra pred.
port the remaining intra pred functions from x86_64 to x86_32.
patch by Dark Shikari.

r741
some simplifications to mmx intra pred that should have been done way back when we switched to constant fdec_stride.
and remove pic spills in functions that have a free caller-saved reg.
patch partly by Dark Shikari.

r740
faster array_non_zero

r739
x86_32 sse2 idct8
ported from ffmpeg by Dark Shikari

r738
checkasm: relax the threshold for floating-point ssim

r737
checkasm: test idct with the range of coefficients what can really be encountered, as opposed to random numbers which might overflow.

r736
intra_rd_refine in B-frames

r735
print average of macroblock QPs instead of frame's nominal QP

r734
update date

r733
remove colorspace conversion support, because it has no business in any codec

r732
misc fixes in checkasm

r731
remove a useless bit of me=umh (originally copied from JM, where it was used for something)

r730
fix a memleak in cqm

r729
fix a memleak in mkv muxer
patch by saintdev

r728
satd exhaustive motion search (--me tesa)

r727
fix cabac context for nonzero delta_qp of the 2nd mb of a frame in interlaced mode

r726
fix mapping of mvs to partitions in p4x4_chroma
patch by Noboru Asai

r725
fix mvp for b16x8 and b8x16 L1 search
patch by Wei-Yin Chen

r724
shave a couple cycles off cabac functions

r723
faster and smaller x264_macroblock_cache_mv etc

r722
configure test for endianness

r721
change the meaning of --ref: it now selects DPB size (including B-frames), rather than L0 size (which B-frames are added to)

r720
add / fix support for FreeBSD, based on a patch by Igor Mozolevsky % igor A hybrid-lab P co P uk %

r719
shut up some valgrind warnings

r718
slightly wrong memory allocation in r717, fixes a potential crash with merange>32

r717
convert absolute difference of sums from mmx to sse2
convert mv bits cost and ads threshold from C to sse2
convert bytemask-to-list from C to scalar asm
1.6x faster me=esa (x86_64) or 1.3x faster (x86_32). (times consider only motion estimation. overall encode speedup may vary.)

r716
round esa range to a multiple of 4

r715
use define _WIN32 instead of __WIN32__ or WIN32 defines.
NSDN reference: http://msdn2.microsoft.com/en-us/library/b0084kay(VS.80).aspx
Patch by BugMaster %BugMaster A narod P ru%
Original thread:
date: Dec 27, 2007 3:18 AM
subject: [x264-devel] VS2008 compilation error (need of replacement __WIN32__ with _WIN32)

r714
tweak x264_pixel_sad_x4_16x16_sse2 horizontal sum. 168 -> 166 cycles on core2.

r713
fix a nondeterminism involving 8x8dct, rdo, and threads.

r712
also test arch-specific x264_zigzag_* implementations in checkasm.c
patch by Patch by Noboru Asai % noboru P asai A gmail P com%

r711
Add AltiVec implementation of
- x264_zigzag_scan_4x4_frame_altivec()
- x264_zigzag_scan_4x4ac_frame_altivec()
- x264_zigzag_scan_4x4_field_altivec()
- x264_zigzag_scan_4x4ac_field_altivec()
each around 1.3 tp 1.8x faster than C version
Patch by Noboru Asai % noboru P asai A gmail P com%

r710
adds AliVec implementation of predict_16x16_p()
over 4x faster than C version

r709
revert the x86_32 part of r708. elf shared libraries aren't important enough to be worth the extra lines of code to check for nasm.

r708
mark asm functions as hidden

r707
check whether ld supports -Bsymbolic before using it

r706
reduce the data type used in some tables. 16KB smaller exe.

r705
faster removal of duplicate mv predictors

r704
avoid a division in x264_mb_predict_mv_ref16x16.
patch by Dark Shikari.

r703
avoid a division in umh.
patch by Dark Shikari.

r702
fix a memleak in h->mb.mvr

r701
fix compilation as a shared library on x86_64 (regression in r696)

r700
add support for x86_64 on Darwin9.0 (Mac OS X 10.5, aka Leopard)
Patch by Antoine Gerschenfeld %gerschen A clipper P ens P fr%

r699
cover some more options in fprofile. (esa, bime, cqm, nr, no-dct-decimate, trellis2)
previously, esa was slower with fprofile than without, since gcc thought it wasn't important. now esa benefits like anything else.

r698
Add AltiVec implementation of x264_pixel_ssd_8x8, 3x faster than C version
Overall speed-up: 0.7% with--bframes 3 --ref 5 -m 7 --b-rdo
Patch by Noboru Asai %noboru P asai A gmail P com%

r697
limit mvs to [-512,511.75] instead of [-512,512]

r696
avoid memory loads that span the border between two cachelines.
on core2 this makes x264_pixel_sad an average of 2x faster. other intel cpus gain various amounts. amd are unaffected.
overall speedup: 1-10%, depending on how much time is spent in fullpel motion estimation.

r695
add cache info to cpu_detect. also print sse3.

r694
cosmetics: reorder mc_luma/mc_chroma/get_ref arguments for consistency with other functions

r693
separate pixel_avg into cases for mc and for bipred

r692
add AltiVec implementation of ssim_4x4x2_core, about 4x faster than C version.
Overall: 0.1-0.2% faster with default encoding settings
Patch by Noboru Asai %noboru P asai A gmail P com%

r691
Add AltiVec implementation ofx264_hpel_filter. Provides a 10-11% overall speed-up with default encoding options
Patch by Noboru Asai %noboru P asai A gmail P com%

r690
cosmetics in dsp function selection

r689
remove sad_pde. it's been unused ever since successive elimination replaced it.

r688
cosmetics: use symbolic constants for frame padding radius

r687
move hpel_filter cpu detection to a function pointer like everything else

r686
cosmetics: use separate variables for frame width and stride

r685
Add AltiVec implementation of add4x4_idct, add8x8_idct, add16x16_idct, 3.2x faster on average
1.05x faster overall with default encoding options
Patch by Noboru Asai % noboru DD asai AA gmail DD com %

r684
add AltiVec implementation of dequant_4x4 and dequant_8x8, 2.8x faster than C,
1.01x faster than previous revision with default encoding options
Patch by Noboru Asai % noboru DD asai AA gmail DD com %

r683
Add AltiVec implementation of quant_2x2_dc,
fix Altivec implementation of quant_(4x4|8x8)(|_dc) wrt current C implementation
Patch by Noboru Asai % noboru DD asai AA gmail DD com %

r682
fix a possible nondeterminism with me=umh + threads.

r681
use hex instead of dia for rdo mv refinement. ~0.5% lower bitrate at subme=7.
patch by Dark Shikari.

r680
port sad_*_x3_sse2 to x86_64

r679
don't overwrite pthread* namespace, because system headers might define those functions even if we don't want them

r678
faster 4x4 sad

r677
fix an arithmetic overflow in trellis at high qp.

r676
implement multithreaded me=esa

r675
fix some integer overflows. now vbv size can exceed 2 Gbit.

r674
allow --vbv-init to take absolute values (in kbit), in addition to the previous fractions of vbv-bufsize.

r673
remove a bashism

r672
reorder headers so that largefile support is defined before the first copy of stdio

r671
regression in r669: broke saving of configure args if make has to re-run configure

r670
regression in r669: --enable-shared should imply --enable-pic on some archs.

r669
* Add a --host flag to allow overriding config.guess; this is particularly
useful with a 64-bits kernel running a 32-bits userland to build 32-bits
apps.
* Normalize any host triplet into a quadruplet via config.sub.
* Move option parsing before any use of architecture information.

r668
* Update config.guess.

r667
mingw doesn't have strtok_r

r666
move os/compiler specific defines to their own header

r665
extend zones to support (some) encoding parameters in addition to ratecontrol.

r664
cosmetics

r663
limit vertical motion vectors to +/-512, since some decoders actually depend on that limit.

r662
Add vertical and horizontal luma deblocking accelerated with Altivec,
based on Graham Booker's code written for FFmpeg with slight modifications
to re-use x264's macros

r661
cosmetics in cpu detection

r660
fix compilation without asm on x86_32 (r658 worked only on x86_64).

r659
exempt 1080p from the non-mod16 warning.

r658
r657
r656
r655
require a ratecontrol method to be specified, it no longer defaults to cqp=26.

r654
fix nnz computation in cavlc+8x8dct+deblock. (regression in r607)

r653
fix the computation of bits used for vbv. (regression in r651)

r652
c89 compile fix

r651
cabac: use bytestream instead of bitstream.
35% faster cabac, 20% faster overall lossless, ~1% faster overall at normal bitrates.

r650
remove the restriction on number of threads as a function of resolution (it was wrong anyway in the presence of B-frames), and raise the max number of threads in general (though more will have to be done before it can really scale to lots of cores).

r649
tweak ssse3 quant

r648
change some tables from int to int8_t. 13KB smaller executable.

r647
faster cabac rdo. up to 10% faster at q0, but negligible at normal bitrates.

r646
workaround gcc's inability to align variables on the stack.
this crash was introduced in r642, but only because previous versions didn't use sse2 on the stack.

r645
32bit version of ssse3 satd.
switch default assembler to yasm. it will still fallback to nasm if you don't have yasm.

r644
simplify trellis

r643
fix an arithmetic overflow in trellis with QP >= 42

r642
2x faster quant. 2% overall.
side effects:
not bit-identical to the previous algorithm.
while the new algorithm covers a wider range of cqms than the previous one did,
I couldn't find a good way to fallback to a general version for the extreme
cqms. so now it refuses to encode extreme cqms instead of just being slower.
lays a framework for custom deadzone matrices, though I didn't add an api.

r641
when encoding with a cqm, probe_skip now also uses the cqm, instead of the flat matrix

r640
cosmetics in asm macros

r639
r638
in hpel search, merge two 16x16 mc calls into one 16x17. 15% faster hpel, .3% overall.

r637
Compile fix

r636
remove private stuff from public headers. no more need for -D__X264__

r635
adjust bitstream buffer sizes for very large frames

r634
conflate HAVE_MMXEXT with HAVE_SSE2, since they were never used distinctly.

r633
* Made -DNEED_ALTIVEC unnecessary, thanks to Guillaume Poirier.

r632
* check x264_cpu_detect() before calling AltiVec functions.

r631
ssse3 detection. x86_64 ssse3 satd and quant.
requires yasm >= 0.6.0

r630
* Use -maltivec when building dependencies, or <altivec.h> cannot be used.
* Do not declare vectors in non-AltiVec files.

r629
* common/cpu.c: runtime AltiVec autodetection on Linux.
* configure, Makefile: do not build the whole project with -maltivec because
it generates AltiVec code in weird places.

r628
fix a small memleak.
patch by Limin Wang.

r627
compile fix for GCC-3.3 on OSX, based on a patch by
Patrice Bensoussan % patrice P bensoussan A free P fr%
Note: regression test still do not pass with GCC-3.3,
but they never did as far as I can remember.

r626
cosmetics in regression test

r625
r624
r623
oops, scenecut detection failed to activate when using threads and not using B-frames

r622
extras/getopt.c was BSD licensed. replace with a LGPL version (from glibc).

r621
Fix build issues on Linux. Only gcc-4.x is supported, as on OSX.
Cleans up a few inconsistencies in the code too.

r620
tweak block_residual_write_cavlc.
up to 1% faster lossless, no difference at normal bitrates.

r619
don't assume int is exactly 4 bytes

r618
make array_non_zero() compatible with -fstrict-aliasing

r617
Honor CFLAGS and LDFLAGS set by the user

r616
Check whether 'echo -n' works, otherwise try printf (fixes build on current OS X 10.5)

r615
Check version of nasm on OS X / Intel

r614
wrong reference frames were used with refs>=14 + pyramid (regression in r607)

r613
enable thread synchronization primitives on linux too

r612
fix a crash with x264_encoder_headers() + threads

r611
don't skip autodection on configure --enable-pthread

r610
more win32threads -> pthreads

r609
cosmetics: rename list operators to be consistent with Perl, and move them to common/

r608
win32: use pthreads instead of win32threads. for some reason, pthreads is much faster.

r607
New threading method:
Encode multiple frames in prallel instead of dividing each frame into slices.
Improves speed, and reduces the bitrate penalty of threading.
Side effects:
It is no longer possible to re-encode a frame, so threaded scenecut detection
must run in the pre-me pass, which is faster but less precise.
It is now useful to use more threads than you have cpus. --threads=auto has
been updated to use cpus*1.5.
Minor changes to ratecontrol.
New options: --pre-scenecut, --mvrange-thread, --non-deterministic

r606
* Do not assume anything about sizeof(cpu_set_t).

r605
* Add support for kFreeBSD (FreeBSD kernel with GNU userland).

r604
Add Altivec implementations of add8x8_idct8, add16x16_idct8, sa8d_8x8 and sa8d_16x16
Note: doesn't take advantage of some possible aligned memory accesses, so there's still room for improvement

r603
Force alignment of the fake .rodata on MacIntel

r602
don't treat vbv_maxrate as a minrate too if it's higher than target average bitrate.

r601
Merges Guillaume Poirier's AltiVec changes:
* Adds optimized quant and sub*dct8 routines
* Faster sub*dct routines
~8% overall speed-up with default settings

r600
10% faster deblock mmx functions. ported from ffmpeg.

r599
checkasm: ignore insignificant differences in floating-point ssim

r598
display final ratefactor in abr when a loose vbv is applied. (still disabled in true cbr)

r597
fix parsing of --deblock %d,%d(beta was ignored)

r596
compute chroma_qp only once per mb

r595
rd refinement of intra chroma direction (enabled in --subme 7)
patch by Alex Wright.

r594
fix a crash in avc2avi

r593
skip deblocking and motion interpolation when using only I-frames

r592
cosmetics

r591
allow fractional values of crf

r590
prefetch pixels for motion compensation and deblocking.

r589
fix a crash on interlace + >8 reference frames

r588
no more decoder. it never worked anyway, and the presence of defunct code was confusing people.

r587
compute pskip_mv only once per macroblock, and store it

r586
slightly faster chroma_mc_mmx

r585
missing emms in plane_copy_mmx

r584
merge center_filter_mmx with horizontal_filter_mmx

r583
1.5x faster center_filter_mmx (amd64)

r582
mmx/prefetch implementation of plane_copy

r581
no more vfw

r580
gtk fixes:
in Makefile
- fix datadir for mingw users
- remove the shared lib during the clean rule
- use $(ENCODE_BIN) instead of x264_gtk_encode
- add some $(DESTDIR) and create some directories when necessary
- remove -lintl
statfile_length -> statsfile_length
fix the "sensitivity" of the widget of update_statfile
the logo is now handled correctly on windows
added: beginning of multipass support
patch by Vincent Torri.

r579
accept mencoder's option names as synonyms (api only, not in x264cli)

r578
simplify satd_sse2

r577
better error checking in x264_param_parse.
add synonyms for a few options.

r576
fix some strides that weren't a multiple of 16.

r575
tweak motion compensation amd64 asm. 0.3% overall speedup.

r574
strip local symbols from asm .o files, since they confuse oprofile

r573
add an option to control direct_8x8_inference_flag, default to enabled.
slightly faster encoding and decoding of p4x4 + B-frames,
and is needed for strict Levels compliance.

r572
allow custom deadzones for non-trellis quantization.
patch by Alex Wright.

r571
move zigzag scan functions to dsp function pointers.
mmx implementation of interlaced zigzag.

r570
support interlace. uses MBAFF syntax, but is not adaptive yet.

r569
allow --zones in cqp encodes

r568
cli: fix some typos in vui parameters from r542.
patch by Foxy Shadis.

r567
* Add an "all" rule to the Makefile. Ideally "default" should be renamed,
but I don't want to break existing scripts.

r566
workaround: on some systems, alloca() isn't aligned

r565
missing picpop

r564
fix a buffer overread from r540

r563
cosmetics (spelling)

r562
faster ESA

r561
faster ESA

r560
* Use the autotool's config.guess script instead of uname to check the
system and CPU types, to avoid issues when using for instance a 32-bit
userland on top of a 64-bit kernel.

r559
* Add the autotool's config.guess script so that we can use it instead
of uname in the configure script.

r558
10l in r553

r557
ssim broke on amd64 w/ pic.

r556
r555
support changing some more parameters in x264_encoder_reconfig()

r554
SSIM computation. (default on, disable by --no-ssim)

r553
configure: --enable-debug reduces optimization to -O1

r552
cosmetics

r551
gcc -fprofile-generate isn't threadsafe

r550
cli: move some options from --help to --longhelp

r549
cli: don't try to get resolution from filename unless input is rawyuv

r548
r542 broke --visualize

r547
Nicer OS X x264_cpu_num_processors (thanks David)

r546
Support OS X and BeOS in x264_cpu_num_processors

r545
Fixes contexts allocation with threads=auto

r544
select initial qp for abr and cbr baased on satd and bitrate, rather than cq24.

r543
--threads=auto to detect number of cpus

r542
api addition: x264_param_parse() to set options by name

r541
fix a rare NaN in ratecontrol

r540
move quant_mf[] from x264_t to the heap, and merge duplicate entries

r539
GTK update. patch by Vincent Torri.
fixed:
cleaning of Makefile
time elapsed seems broken ('total time' label replaced by 'time remaining')
text entries of the status window are now not editable
added:
compilation from x264/ (add --enable-gtk option to configure)
shared lib creation if --enable-shared is passed to configure
x264gtk.pc
--b-rdo, --no-dct-decimate

r538
new option: --qpfile forces frames types and QPs.
(intended for ratecontrol experiments, not for real encodes)

r537
api change: select ratecontrol method with an enum (param.rc.i_rc_method) instead of a bunch of booleans.

r536
slightly faster mmx dct

r535
OpenBSD build fixes.
patch by Vizeli Pascal (pvizeli at yahoo dot de)

r534
mc_chroma width2 mmx

r533
make libx264.so symlink relative

r532
GTK update. patch by Vincent Torri.
added:
direct=auto
no-fast-pskip
vbv
cqm
tooltips (without descriptions yet)
translations
`make clean` for .exe
when file exists, ask for override
fixes:
debug level bug
bitrate slider bug
mixed-refs can be set only if ref>1
i8x8 can be set only if 8x8 transform is enabled
# of threads capped at 4
fourcc can't be removed
cosmetics

r531
vfw installer: tweak nsis compression.
patch by Francesco Corriga.

r530
Fixed typo that caused x264_encoder_open to always fail

r529
check some mallocs' return value

r528
make -> $(MAKE)

r527
convert non-fatal errors to message level "warning".

r526
fix a memory alignment. (no effect on x86, but might be needed for other simd)

r525
when using DEBUG_DUMP_FRAME, write decoded pictures in display order.
patch by Loic Le Loarer.

r524
non-referenced B-frames should have the same frame_num as the following ref frame, not the previous.
patch by Loic Le Loarer.

r523
set the SPS constraint_set[01]_flag based on the profile in use, just in case some decoder cares

r522
msvc doesn't like C99 named array initializers

r521
allow sar=1/1.
patch by Loic Le Loarer.

r520
faster intra search: filter i8x8 edges only once, and reuse for multiple predictions.

r519
faster intra search: some prediction modes don't have to compute a full hadamard transform.
x86 and amd64 asm.

r518
--sps-id, to allow concatenating streams with different settings.

r517
typo in expand_border_mod16

r516
typo impaired 2pass bitrate prediction.

r515
Let the user choose the compiler with "CC=xxx ./configure"

r514
More vector types fixes for gcc 3.3

r513
More vector casts to try and make compilers happier

r512
Use sa8d instead of satd for i8x8 search.
+.01 dB, -.5% speed

r511
Before evaluating the RD score of any mode, check satd and abort if it's much worse than some other mode.
Also apply more early termination to intra search.
speed at -m1:+1%, -m4:+3%, -m6:+8%, -m7:+20%

r510
* common/ppc/pixel.c: fixed illegal implicit casts of vector types.

r509
* Added %$#@#$! support for #@%$!#@ armv4l CPU.

r508
When evaluating predictors to start fullpel motion search, use subpel positions instead of rounding to fullpel.
about +.02 dB, -1.6% speed at subme>=3
patch by Alex Wright.

r507
mmx implementation of x264_pixel_sa8d

r506
10l in r463 (q0 i16x16 dc was permuted)

r505
typo in r504

r504
update msvc project files.
patch by anonymous.

r503
Before, we eliminated dct blocks containing only a small single coefficient. Now that behavior is optional, by --no-dct-decimate.
based on a patch by Alex Wright.

r502
Enables more agressive optimizations (-fastf -mcpu=G4) on OS X.
Adds AltiVec interleaved SAD and SSD16x16.
Overall speedup up to 20%.
Patch by anonymous

r501
faster cabac_encode_bypass

r500
restored AltiVec dct

r499
more AltiVec mc, ~4.5% overall speedup

r498
slightly faster loopfilter

r497
3% faster satd_mmx

r496
cosmetics in sad/ssd/satd mmx

r495
store quoted configure options. needed e.g. for multiple args under --extra-cflags.

r494
fix a yasm-incompatible syntax in x86 asm

r493
yasm noexec stack

r492
more interleaved SAD.
25% faster halfpel.

r491
more interleaved SAD.
1% faster umh, 6% faster esa.

r490
interleave multiple calls to SAD.
15% faster fullpel motion estimation.

r489
* Added support for ppc64. I'm really fucking tired of having to do this.

r488
use LDFLAGS when linking shared lib

r487
r486
GTK: support yuv4mpeg input.
patch by Vincent Torri.

r485
GTK: fix avs input
patch by Vincent Torri.

r484
cli: support yuv4mpeg input.
patch by anonymous.

r483
GTK: compilation fixes

r482
GTK: compilation fixes on mingw,
add avs input for the app (if avalaible),
add filters for the filechooser,
add icon for the main window.
patch by Vincent Torri.

r481
GTK-based graphical frontend.
patch by Vincent Torri.

r480
silence some gcc warnings

r479
use FDEC_STRIDE instead of a parameter in mmx dct
.5% speedup

r478
* configure: support for 64 bits MIPS.

r477
10l in r473 and stdin

r476
RD subpel motion estimation (--subme 7)

r475
cosmetics in cabac_mb_cbf

r474
separate --thread-input from --threads

r473
if --threads > 1, then read the input stream in its own thread.

r472
FreeBSD uses ELF

r471
10l in r470 on x86_64

r470
some mmxext functions really only required mmx.

r469
simplify get_ref and mc_luma

r468
b16x16 wpred analysis used wrong weight

r467
configure: --enable-shared for libx264.so

r466
wrong modulus when delta_qp = +26

r465
10l in vbv + 2pass

r464
macroblock-level ratecontrol: improved vbv strictness, and improved quality when using vbv.

r463
keep transposed dct coefs. ~1% overall speedup.

r462
tweak rounding of 8x8dct

r461
cosmetics in makefile

r460
cosmetics: muxers -> muxers.c

r459
no --nr in intra blocks. intra prediction doesn't work well enough for the residual to be indicative of noise.

r458
10l in direct auto + multiref + 1pass

r457
--direct auto
selects direct mode per frame. works best in 2pass (enable in both passes).

r456
change default direct mode to spatial

r455
remove TODO. most of it is done, and the rest is out of date.

r454
more amd64 mmx intra prediction

r453
for i8x8 neighbors, don't assume a new slice starts at the edge of the frame

r452
* common/i386/i386inc.asm: got PIC to work for real on OS X x86.

r451
* common/i386/*.asm: don't use the "GLOBAL" reserved word, some versions
NASM complain about it. Replaced it with "GOT_ebx".

r450
* configure: activate minor nasm optimisations, such as assembling
"add eax, 8" as "add eax, byte 8".

r449
* common/i386: factored the .rodata section declaration into i386inc.asm.

r448
* configure common/i386/i386inc.asm: got rid of -DFORMAT_* nasm flags
and use built-in preprocessor tests instead.

r447
* common/i386/i386inc.asm: tell the ELF linker about our stack properties
so that it does not assume the stack has to be executable.

r446
10l in r443 (p4x4 chroma)

r445
copy current macroblock to a smaller buffer, to improve cache coherency and reduce stride computations.
part 3: asm

r444
copy current macroblock to a smaller buffer, to improve cache coherency and reduce stride computations.
part 2: intra prediction

r443
copy current macroblock to a smaller buffer, to improve cache coherency and reduce stride computations.
part 1: memory arrangement.

r442
h->mc.copy()

r441
lowres intra used wrong neighboring pixels

r440
trellis=2 slightly affected intra analysis even without subme=6

r439
* encoder/ratecontrol.c: OS X support for exp2f and sqrtf.

r438
allow delta_qp > 26

r437
ratecontrol didn't always account for header bits, causing an undersize in multipass with --ratetol inf.

r436
-q0 --b-rdo wasn't lossless

r435
cosmetics

r434
allow ',' separator for --filter

r433
VfW: 10l in bime and refs

r432
more lowres mv clipping fixes

r431
VfW: cosmetics

r430
VfW: support trellis, brdo, nr, bime.
patch by Dan Nelson (dnelson at allantgroup dot com).

r429
amd64 mmx for some intra pred functions

r428
dequant_mmx made incorrect assumptions about extreme inputs. now uses 32bit in more cases.
patch by Christian Heine.

r427
lowres can reuse the normal mv cost table

r426
r422 broke x264_center_filter_mmxext


r425
* configure: define FORMAT_ELF under Linux and FORMAT_AOUTB under *BSD.

r424
* common/i386/i386inc.asm: support for ELF, a.out and Mach-O objects.

r423
* configure: added a --enable-pic flag.

r422
* Additional fixes to the PIC versions of assembly routines. They now pass
all checkasm tests and output streams are bit-by-bit identical, which
sounds good.

r421
* tools/checkasm.c: print the random seed used for the test, to allow for
replays. It looks like dequant_4x4 fails 1 time out of 600, with the
following seeds for instance: 1423 1957 2149 2455 3385 3403 3724 4095.

r420
cosmetics in mc_chroma


r419
* Oh, so what I thought was unused code was in fact used. This fixes my
breakage but makes the code rather slow in PIC mode. I will fix it later.

r418
* Support for x86 position-independent code (PIC), needed for dynamic libs
on Mac OS X Intel. I tried to make this as little intrusive as possible.

r417
msvc: #define isfinite()

r416
x86 mmx for some intra pred functions

r415
cosmetics: reorganize intra prediction dsp

r414
too many systems don't have off_t; use uint64_t instead.

r413
fix order of frame evaluation in pre-me

r412
update AUTHORS

r411
fix a check for NaN in ratecontrol

r410
fix mv predictors in pre-me for b-adapt.

r409
print --nr in sei params. tweak ratecontrol param checking.

r408
I've moved

r407
write correct VUI timing info

r406
early termination in UMH search

r405
split mv_range enforcement from edge-of-frame clipping. fixes an occasional artifact with long mvs.

r404
cosmetics: suppress warning on unused variables

r403
cosmetics: simplify #includes

r402
* configure: NSLU2 platform support (why oh why)

r401
Re-enabled x86 optims on MacIntel, assume Nasm CVS is installed and
-f macho -DPREFIX just seems to do the job

r400
Quick compile fix for OS X / Intel
Optimizations are disabled at the moment. In order to get them to
work, we'd need either nasm to be able to output Mach-O object files,
or we should convert the assembly code to something OS X can handle,
like gas.

r399
cli: large file support

r398
dct-domain noise reduction (ported from lavc)

r397
early termination within large SADs. ~1% faster UMH, ~4% faster ESA.

r396
mkv: increase nalu size size to 4 bytes.
patch by Haali.

r395
less 64bit math: 12% faster trellis

r394
more error checking of input parameters

r393
always write sps.vui

r392
use some extra packing modes for CQM headers.
fix typo in --cqm4p[yc].

r391
MSVC compatibility fixes

r390
joint bidirectional motion refinement (--bime)

r389
fix some overflows in mp4 timestamps.
patch by Francesco Corriga.

r388
Successive elimination motion search: same as exhaustive search, but 2-3x faster.

r387
Fixed cc_check on OS X (gcc -o /dev/null always fails)

r386
postpone pskip decision until after p16x16ref0 motion search.
reduces the number of erroneous pskips in low-detail regions.

r385
configure: autodetect gpac, avis, pthread, vfw

r384
--no-fast-pskip
patch by Alex Wright.

r383
cosmetics: config.h is now modified only by configure. make now calls configure if you haven't.

r382
MP4: set "track enabled" flag.
patch by Robert Swain.

r381
faster subpel motion search.
patch by Alex Wright.

r380
don't use gnu extensions to grep and sed.

r379
pkg-config: major.minor.patch version

r378
`make fprofiled` to automate gcc -fprofile-generate/use

r377
10l

r376
param.b_repeat_headers (not yet used)

r375
support pkg-config.
patch by Caro.

r374
write encoding options to the userdata SEI and to the 2pass statsfile.
check for incompatible options in the 2nd pass.

r373
change default level to "5.1"

r372
skip dequant+idct of decimated blocks.

r371
after a 1pass ABR, print the value of --crf which would result in the same bitrate.

r370
subpel search: always check mvp.

r369
faster b-rdo (skip RD of modes with bad SATD).
patch by Alex Wright.

r368
RD mode decision for B-frames (--b-rdo)
patch by Alex Wright.

r367
* common/amd64/quant-a.asm: added missing GLOBAL flags that prevented PIC
builds, thanks to Anssi Hannula.

r366
* configure: added the Alpha platform.

r365
use array_non_zero() when we don't need a full array_non_zero_count()

r364
mmx dequant. up to 3% speedup w/ RD.

r363
allow --level to understand names in addition to idc

r362
check (most of) the levels constaints.
set default max_mv_range based on level_idc.

r361
if p16x16 RD decides to code a MB as p_skip, then don't check smaller partitions.

r360
Trellis RD quantization.
around +.2 dB

r359
cosmetics: XCHG macro

r358
skip a few duplicate candidates in qpel search.

r357
skip a few duplicate candidates in fullpel hex&umh search.

r356
cli: arithmetic overflow in bitrate printing

r355
cosmetics in x264_cabac_mb_type

r354
X264_ABS => abs

r353
amd64 sse2 8x8dct. 1.45x faster than mmx.

r352
allow 1pass ratecontrol with keyint=1

r351
cli: print estimated time left in --progress

r350
doc/ratecontrol.txt

r349
rm doc/dct.txt

r348
in constant QP mode, write that QP in the PPS to save a few bits in each slice header.

r347
faster decimation

r346
cosmetics: fix an erroneous warning from r340.

r345
cosmetics: change literal cabac_block_cat to an enum.

r344
cabac: merge i_state with i_mps. bs_write multiple bits at once.

r343
remove unused adaptive cabac_idc code

r342
Fixed compilation on PPC (spotted by David Wolstencroft)

r341
mmx deblocking.
2.5x faster deblocking functions, 1-4% overall.

r340
If frame count is known at init time (cli & vfw), then abort if the 2nd pass
exceeds the length of the 1st pass.
If it's not known (mencoder), then report a non-fatal error when we run off the
end of the 1st pass stats, and switch to constant QP.

r339
move checkasm to tools/
delete unused stuff in testing/
`make clean` deletes checkasm and avc2avi

r338
checkasm: check 8x8dct, mc average, quant, and SSE2.

r337
r336 broke amd64 x264_pixel_sad_16x16_sse2 (though it's not being used)

r336
Windows 64bit asm.
patch by squid_80.

r335
delete build/cygwin because it's handled in the main configure/makefile.

r334
--crf: 1pass quality-based VBR.

r333
Added --enable-gprof (patch by Johannes Reinhardt)

r332
cosmetics: remove #if0'ed code
patch by Robert Swain.

r331
faster bs_write

r330
during RDO, skip the bitstream writing and just calculate the number of bits
that would be used. speedup: cabac +4-8%, cavlc +2-4%.

r329
Use SAD instead of SATD for halfpel motion search.
Move multiref termination after halfpel search.
Total: 3-7% speedup and +/-.02 dB.
patch by Alex Wright.

r328
VfW: mixed refs.
patch by celtic_druid.

r327
allow non-mod16 resolutions

r326
VfW: prevent duplicate free() in compress_end()

r325
cosmetics: remove declarations of nonexistent asm functions

r324
cosmetics (whitespace) in VfW

r323
VfW: some reorganization
patch by Francesco Corriga.

r322
cosmetics: merge some duplicate tables

r321
remove cabac byte-stuffing code, because it just wastes bits in lossless, and does nothing at all at sane bitrates.

r320
don't allocate lowres planes if they won't be used (i.e. in the 2nd pass).

r319
cosmetics: move some stuff from macroblock_encode to cache_save

r318
new option: --mixed-refs
Allows each 8x8 or 16x8 partition to independently select a reference frame, as opposed to only one ref per macroblock.
patch mostly by Alex Wright (alexw0885 at hotmail dot com).

r317
cosmetics in option parsing

r316
expose the rest of the VUI flags.
patch by Christian Heine.

r315
* common/amd64/mc-a.asm: use RIP-relative addressing in PIC mode.

r314
temporal predictors for 16x16 motion search.

r313
slightly faster/cleaner block_residual_write_cabac

r312
cosmetics

r311
cli: fix a crash on piped input.

r310
stats summary: separately report all 5 partition sizes, and add ref usages

r309
disposable frames shouldn't get their own coded_frame_num.

r308
typo in ia32 x264_pixel_avg_weight_w8_mmxext

r307
mmx avg (already existed by not used for bipred)
mmx biweighted avg (3x faster than C)

r306
cosmetics: move avg function ptrs from pixf to mc.

r305
with B-pyramid, forget old refs in POC order instead of coded order.
(before, b_skip was unavailable with pyramid and ref=1)

r304
typo in r296.
patch by lurui.

r303
* common/amd64/*.asm: use RIP-related addressing in PIC mode.

r302
* common/amd64/mc-a.asm: removed useless global variables

r301
* configure: support extra $(ASFLAGS) through --extra-asflags.

r300
reorganized VfW UI.
patch by Antony Boucher, graphic by Jarod.

r299
MP4 output: update to GPAC 0.4 API.
patch mostly by Robert Swain.

r298
faster mmx quant 15bit, and add 16bit version. total speedup: ~0.3%
patch by Christian Heine.

r297
faster mmx satd. *x16: 20%, *x8: 10%, total: 2-4%.
ia32 patch by Christian Heine, amd64 port by me.

r296
allow i4x4 and i8x8 down-left prediction with emulated top-right samples.
based on a patch by Johannes Reinhardt (Johannes dot Reinhardt at uni-konstanz dot de)

r295
r294
* configure: added support for ia64, mips/mipsel, m68k, arm, s390 and hppa
platforms, as well as linux sparc.

r293
MMX quantization functions, and optimization of the C versions.
about 3x faster quant_8x8, quant_4x4, quant_4x4_dc, and quant_2x2_dc. total speedup: 4-10%.
patch by Alexander Izvorski and Christian Heine.

r292
SSE2 pixel comparison functions
P4: SAD 16x*, SSD 16x*, SATD 16x*: 30% faster, SATD 8x8: 15% faster, total: 2-4% faster
K8: SSD 16x*: 6% faster, total: not much
patch by Alexander Izvorski.

r291
10l in rev290: duplicate declaration of x264_pixel_sub_8x8_mmx.

r290
mmx 8x8 dct.
On a K8: sub16x16_dct8 3806->1461, add16x16_idct8 4852->1297 cycles. total speedup: 1-3%.
patch by Christian Heine (sennindemokrit at gmx dot net)

r289
VC++ fix (thx fenrir)

r288
x264.h: issue an explicit warning when neither stdint.h nor inttypes.h
has be included before x264.h

r287
VfW: SAR wording. patch by Sharktooth.

r286
cli: workaround to allow "--ratetol inf" on win32.

r285
r284
* all: Patch by Mike Matsnev :
"The following things were fixed:
* AR calculation was broken on previous import
* Wrong conditional in write_nalu_mkv() was fixed
* Error checking was added in all places"

r283
xyuv: bug fixes + autodetect of video size.

r282
Run ranlib after make install (OS X needs that)

r281
update i_mb_b16x8_cost_table[] for I8x8 mb type (r278 only fixed a symptom).

r280
* all: Added matroska writing. Patch by Mike Matsnev.

r279
* pixel.*:
"I have completed additonal SAD implementations (8x16, 16x8 and 16x16)
using Sparc VIS.Overall speedup is roughly 90% from straight C.I'm
doing development and testing on a Sun Fire V220, with 2 * 1.5ghz
UltraSPARC-III CPUs.
I've hand-unrolled each of the loops.Sun's assembler does not appear
to have macro functionality built-in and I didn't want to establish an
external dependancy on m4.Please let me know if you run into any
trouble with the patch."
Patch by Phil Jensen.

r278
analyse: "It correct the size of array i_mb_b16x8_cost_table
from 16 to 17,otherwise,it can result a mismatch of b16x8
mb type cost and can result memory read overflow on it." Patch by lurui.

r277
* x264 compilation on NetBSD. Patch by Mike Matsnev.

r276
* all: "8x8 SAD written in Sparc Assembly using VIS." Patch by Phil Jensen.

r275
10l: rd score for sub-8x8 partitions used wrong mvs.

r274
faster SAD_INC_2x16P for amd64.
patch by Josef Zlomek.

r273
Fixed win32 handle leakage (thanks Trax)
Default enabled support of threads on BeOS

r272
* Add support for UltraSparc (uname -m: sun4u) with Solaris.
Patch by Tuukka Toivonen.

r271
* Faster SAD_INC_2x16P. Patch by Alexander Izvorski.

r270
example quant matrix file

r269
--cqmfile reads quant matrices in a JM-compatible format.

r268
adjust coded buffer size based on input resolution and QP (old default wasn't enough for HD lossless)

r267
update avc2avi for high profile

r266
custom quant matrices

r265
VfW: workaround a windows unicode bug.
patch by Leowai.

r264
lossless mode enabled at qp=0

r263
VfW: enable RDO. some option dependencies.
patch by Francesco Corriga.

r262
rate-distortion optimized MB types in I- and P-frames (--subme 6)

r261
more VfW options.
patch mostly by celtic_druid.

r260
VFW: 8x8 transform, SAR.
patch by celtic_druid.

r259
threads option in vfw.
patch by celtic_druid.

r258
win32 threads enabled by default

r257
vfw installer nsis script.
patch by Francesco Corriga.

r256
print 8x8 transform usage % in stats summary.

r255
revert 216, another try at max_dec_frame_buffering.
disable adaptive cabac_idc by default; 0 is always best anyway.

r254
typo in cabac tables

r253
cosmetics

r252
fix i8x8 decision with chroma_me

r251
SATD-based decision for 8x8 transform in inter-MBs.
Enable 8x8 intra.
CLI options: --8x8dct, --analyse i8x8.

r250
Use win32 native threads (you still have to --enable-pthread to use
them, though)

r249
slightly faster 8x8 dct

r248
remove unused tables from SPS/PPS. reduces overhead when syncing threads.

r247
10l (debug stuff in 246)

r246
8x8 transform and 8x8 intra prediction.
(backend only, not yet used by mb analysis)

r245
cosmetics

r244
fix a bug with cabac + B-frames + mref + slices.
call visualization per frame instead of per slice.

r243
accept the standard --prefix etc. options

r242
tweak cflags

r241
Fixed multithreading on BeOS (pthread emulation required)

r240
multithreading (via slices)

r239
move zones parsing to ratecontrol.c; allows passing in zones as a string.

r238
UMHex motion seach (but no early termination yet)

r237
Zoned ratecontrol.

r236
fix rounding of intra dequant when qp<=3

r235
API: x264_encoder_reconfig(). (not yet used by any frontend)

r234
Makefile: in target "install", first create the directories if they
don't already exist

r233
Optimized subXxX_dct

r232
s/==/=/

r231
ppc/: compile fixes for Linux/PPC (courtesy of Rasmus Rohde) and
for gcc < 4

r230
visualize reference pic numbers. misc cleanups in visualization.
patch by Tuukka Toivonen.

r229
ppc/*: more tuning on satd (+5%)

r228
CLI option: --seek

r227
CLI option: --visualize
Displays the encoded video along with MB types and motion vectors.
patch by Tuukka Toivonen.

r226
fix an uninitialized value in slicetype_analyse

r225
port recent MC asm changes to amd64.
patch by Josef Zlomek.

r224
ppc/*:
+ Removed unused code
+ Optimized mc chroma 4xH and satd 8x4 and 4x8
+ Won a bunch of cycles by not trusting gcc about inlining and
unrolling properly
(about 17% faster globally)

r223
New ratecontrol options:
1pass ABR. VBV constraint for ABR and 2pass.
There is no longer a dedicated CBR mode: use ABR+VBV.
VfW now uses ABR instead of CQP for 1st of multipass.

r222
use a predicted mv as starting point for subpel refinement.

r221
slight speedup in halfpel interpolation.
patch by Mathieu Monnier.

r220
Cleaner allocation of tmp space in halfpel interpolation; fixes some valgrind/nasm warnings.
patch by Mathieu Monnier.

r219
"2pass failed to converge" is no longer considered fatal.

r218
Updated MSVC project files.
thanks to Bonzi.

r217
cosmetics.
silence some gcc warnings.
amd64 doesn't need a separate copy of the c/h files, only the asm.

r216
10l (214 wrote wrong DPB size in SPS -> B-pyramid broke)

r215
CLI (mp4): return to 'capture' output mode, remove useless SetCtsPackMode() (fixed in gpac).
Note: requires gpac cvs-20050419 or later.
patch by bobo.

r214
combined L0 & L1 reference lists are limited to a total of 16 pics.

r213
amd64 asm patch, part2.
by Josef Zlomek ( josef dot zlomek at xeris dot cz )

r212
amd64 asm patch, part1.

r211
Allow manual selection of fullpel ME method. New method: Exhaustive search.
based on a patch by Tuukka Toivonen.

r210
misc makefile changes.
propogate --extra-cflags to vfw.
'make clean' removes x264.exe and vfw.
tweak dependencies.

r209
10l (CLI: fflush after progress update)

r208
CLI: progress indicator

r207
VfW: build from main makefile

r206
[mp4] ftyp & moov boxes at the begining of the file, (thanks to jeanlf
for comments)
patch by bobololo

r205
CLI: --fps had side-effects. fixed.

r204
CLI: cosmetics

r203
Makefile: strip x264cli.
tweak stats summary.

r202
* x264.c: Fix ctts box creation. Patch by bobololo from Ateme.

r201
common/ppc: more cleaning, optimized a bit

r200
CLI: require output file (don't default to stdout). warn if trying to use mp4 or avis when not supported. misc cleanup.

r199
configure:use -falign-loops=16 on OS X
common/ppc/: added AltiVecized mc_chroma + cleaning
checkasm.c:really fixed MC tests

r198
Configure tweaks. Allow avis-input in mingw. Turn off debug by default.

r197
checkasm.c: fixed MC tests

r196
CLI: MP4 muxing.
patch by bobo from Ateme.

r195
Cygwin fixes

r194
configure: ooops, restored -g
ratecontrol.c: OS X has exp2f in -lmx
checkasm: quick compile fix

r193
add x86_64 to configure

r192
set svn:ignore

r191
Added a configure to detect the platform/system/etc so people don't
have to edit the Makefile (will work for Linux/OS X/BeOS/FreeBSD, feel
free to modify for others), and we can now remove the Jamfile which
was broken most of the time anyway.

r190
Makefiles: better dependencies for SEI version number

r189
Forgot rbsp_trailing_bits in AUD NAL

r188
Optionally use access unit delimiter NAL units.

r187
VfW: cleaner install on win98.
patch by Riccardo Stievano.

r186
new util: countquant for 2pass statsfiles

r185
print svn version number in SEI info and in CLI/VfW.

r184
Make reconstructed frame available to caller.

r183
make install

r182
free() -> x264_free()

r181
CLI: flush B-frames at the end of the encode

r180
convert mc's inline asm to nasm (slight speedup and msvc compatibility).
patch by Mathieu Monnier.

r179
buffer overruns in slicetype_decision.
patch by Mathieu Monnier.

r178
tweak usage message

r177
Simplify inter analysis option names. (psub16x16 -> p8x8)
patch by Robert Swain.

r176
173 broke .depend when debugging was enabled

r175
early termination for intra4x4 analysis

r174
Check/fix range of x264_param_t.rc.i_qp_constant.

r173
Cleaned up and fixed Makefile for OS X and BeOS (hopefully FreeBSD too)
It defaults for x86/linux, others: uncomment the lines for your
platform & OS at the beginning of the Makefile

r172
macroblock_analyse: simplify cost comparisons. (cosmetic)
CLI: enable cabac by default.

r171
Chroma ME (P-frames only).

r170
SSE optimized chroma MC.
patch by Radek Czyz.

r169
167 broke psnr calculation for non-mod-32 inputs

r168
sqrtf requires -lmx on Mac OS X

r167
use mmx ssd for psnr calculation.

r166
revert 164. blame Spyder.

r165
SSD comparison function (not yet used).
Cosmetics in mmx SAD.

r164
VfW: reject YUY2 and RGB input formats

r163
Really fix QP override.

r162
write VUI bitstream restrictions

r161
AVI & Avisynth input (win32 only).
patch by bobo from Ateme.

r160
expose option "chroma qp offset"

r159
Fix per-frame QP override broken in rev 137.

r158
Don't include x264.o in the library.

r157
VfW: expose B pyramid and weighted B prediction.
patch by Riccardo Stievano.

r156
10l

r155
buffer overrun when bframes == X264_BFRAME_MAX

r154
Adaptive B skipped some POC numbers (slightly reducing b_direct efficiency).

r153
avc2avi:
Use POC to determine frame boundaries (frame_num couldn't distinguish consecutive B-frames).
Fix keyframe flag to mark IDR only, not all I slices.

r152
allow 16 refs (instead of 15)

r151
report version number in decimal instead of hex

r150
New option: "B-frame pyramid" keeps the middle of 2+ consecutive B-frames as a reference, and reorders frame appropriately.

r149
smarter parsing of resolution from commandline

r148
ratecontrol.c: fixed exp2f on BeOS so rate control works properly

r147
Fix a buffer overrun with very long MVs.

r146
wrong stride in lowres image

r145
10l (fast1stpass was slower than non-fast)

r144
Disable deblocking filter in frames of sufficiently low QP that it would have no effect. (Saves a little CPU time in the decoder.)

r143
Simplify x264_frame_expand_border.

r142
Altivec functions for MC using the cached halfpel planes.
Patch by Fredrik Pettersson <fredrik_pettersson at yahoo dot se>.

r141
Don't use uninitialize MVs in x264_mb_predict_mv_ref16x16.

r140
Implicit weights in B16x16 analysis were swapped.
patch by Radek Czyz.

r139
Cosmetics: Some renaming. Move the rest of slice type decision from encoder.c to slicetype_decision.c

r138
Take into account keyint_max in B-frame decision.

r137
Preliminary adaptive B-frame decision (not yet tuned).
Fix flushing of delayed frames when the encode finishes.

r136
Write x264's version in a SEI message.

r135
VfW: Enable weighted B prediction when max B-frames > 1. Enforce max reference frames <= 15.
patch by Riccardo Stievano.

r134
Add: implicit weighted prediction for B-frames.
Slightly optimize x264_mb_mc_01xywh.
Fix an error in B16x8 cost.

r133
Oops, increment API number.

r132
Configurable level. Levels are still not enforced; it's up to the user to select a level compatible with the rest of the encoding options.
Patch by Jeff Clagg <snacky at ikaruga dot co dot uk>.

r131
Always use the tempfile and rename method for multipass stats, so that VfW knows whether the previous pass completed.

r130
More tweaks to bitrate prediction.
Change error messages when 2pass fails to converge.

r129
Improved 2pass bitrate predictor. No real change most of the time, but allows correct ratecontrol on some pathological videos that used to diverge completely. Also improves prediction when 2nd pass bitrate is very different from 1st pass.
The new qscale2bits() has no simple inverse, so I also had to change rc_eq to output qscale instead of bits.

r128
Some defines needed by MSVC, and convert the DSP files to DOS-style newlines.
Patch by Radek Czyz.

r127
Precalculate lambda*bits for all allowed mvs. 1-2% speedup.

r126
Deblock B-frames. (Not yet used, since B-frames aren't kept as references.)

r125
Simplify x264_mb_mc_01xywh()

r124
Save some memcopies in halfpel ME.
Patch by Radek Czyz.

r123
Cache half-pixel interpolated reference frames, to avoid duplicate motion compensation.
30-50% speedup at subq=5.
Patch by Radek Czyz.

r122
In N-pass mode if stat_in and stat_out are the same file, instead save to a temp file and overwrite stat_in only when the encode finishes.

r121
VfW: x264_log now creates a window for error messages

r120
cosmetics

r119
bs_align_1() didn't actually write all ones. (so encoded streams with cabac were technically invalid, though no decoder cares.)
Patch by Tuukka Toivonen.

r118
VfW: tweak option names

r117
VfW: use separate stats files for each pass of an N-pass encode.

r116
VfW: Enable multipass by default, increase the configurable range of I and B quant ratios.
core: Tweak error messages.

r115
r114 didn't completely fix the problem, trying again.

r114
Another MV clipping fix.

r113
Simplify x264_cabac_mb_type.

r112
More accurate clipping rectangle for motion search. (slight compression improvement for high-motion scenes)

r111
encoder/encoder.c: gcc < 3 compile fix

r110
Change default level from 2.1 to 4.0 until I get around to calculating actual levels.

r109
Clipping mvs to within picture + emulated border when running motion compensation.

r108
Fix clipping of mvs in probe_pskip. (Previously it mixed up fullpel with qpel.) This should eliminate the black blocks that sometimes appeared in high motion, low detail scenes.

r107
Fix length of strings stored in the registry.
Patch by Riccardo Stievano.

r106
registry values for min/max keyint were mixed up

r105
VfW: expose option "Nth pass" (i.e. simultaneously read and update the multipass stats file).
Patch by Riccardo Stievano.

r104
add "make NDEBUG=1" to strip library

r103
finish subpixel motion refinement for B-frames (up to 6% reduced size of B-frames at subq <= 3)

r102
VfW: expose the 2pass ratecontrol option: qcomp ("bitrate variability").
Some rearranging of the advanced configuration dialogue.
Patch by Riccardo Stievano <walkunafraid at tin dot it>.

r101
VfW: Support ip_factor and pb_factor, some cleanups.
patch by Riccardo Stievano <walkunafraid at tin dot it>

r100
Use floats instead of int64 in log messages, since win32 (incl. mingw) doesn't understand %lld.
Also display MB statistics in percent instead of number.

r99
finished printf -> x264_log conversion.

r98
Don't apply keyframe boost to I-frames that are followed by another I.

r97
New VfW option: "fast 1st pass" automatically disables some partitions and reduces ME quality and number of reference frames.
Removed option direct_pred=none, since it provides no benefits.
Patch by Riccardo Stievano <walkunafraid at tin dot it>.

r96
vfw: tweak wording and defaults

r95
From Riccardo Stievano <walkunafraid at tin dot it>:
here's a patch that fixes the VfW frontend after the changes made in
revision 93 (GOP size management). Default values for i_keyint_max
and i_keyint_min have been set to 250 and 10, respectively.

r94
My last change of IDR decision broke in 2pass mode. fixed by remembering which frames are IDR.
Disable benchmarking, as it was very slow for some people, and we already know that all the time is spent in macroblock analysis.

r93
Changes the mechanics of max keyframe interval:
Now enforces min and max GOP sizes, and allows variable numbers of
non-IDR I-frames within a GOP.

r92
MinGW compatible resource.rc by Radek Czyz

r91
strict QP offset for B-frame vs following P-frame
strict QP offset for I-frame vs GOP average

r90
r72 broke B-frames without intra4x4. fixed.

r89
updated VfW interface by Radek Czyz

r88
improved mv prediction: 1-3% better compression of B-frames
early termination for B-frame ref search: up to 20% faster with lots of refs.

r87
allow constant qp on Nth pass (e.g. for forcing frame types)

r86
disable subme=0 (the huge bitrate penalty wasn't worth the speed)
renumber direct_pred

r85
oops, last patch had some debug statements

r84
fix: "x264 -A all" didn't include b8x8 types.
add: "make NDEBUG=1" to strip library
update TODO with B-frame status

r83
Reorganize frame type selection: No longer produces consecutive I-frames when B-frames are enabled. Not thoroughly tested, but works for me.
Fix scenecut detection when B-frames are present: Can now produce IDR, but is slower since it re-encodes more frames. This might reduce compression ratio in the presence of quick fade-ins.
2pass ratecontrol deals more gracefully with completely skipped frames.

r82
remove Makefile.cygwin because build/cygwin/Makefile is more up to date.
put correct object file names in .depend

r81
reduce default verbosity, add option -v

r80
remove relative include paths, to avoid conflicts with libtool

r79
rename *.asm to avoid conflicts with libtool

r78
list default settings in --help

r77
replace EPZS diamond with a hexagon search pattern.
early termination for multiple reference frame search (up to 1.5x faster).

r76
sps->i_num_ref_frames was set higher than necessary

r75
new option: --fps

r74
various cleanups in macroblock caching.
store motion data for each reference frame (but not yet used).

r73
more accurate cost for psub8x8 modes.

r72
implement macroblock types B_16x8, B_8x16
tweak thresholds for comparing B mb types

r71
simplify x264_mb_predict_mv_direct16x16_temporal

r70
option '--frames' limits number of frames to encode.
patch by Tuukka Toivonen <tuukkat at ee.oulu.fi>

r69
simplify calvc mb type

r68
implement macroblock types B_SKIP, B_DIRECT, B_8x8

r67
rename 'core/' to 'common/', which avoids conflicts with libtool

r66
cleanup stats reporting
report B macroblock types
report average QP

r65
apply ip_factor and pb_factor in constant quantiser encodes.

r64
save a little bit of memory

r63
multiple hypothesis mv prediction:
1-3% improved compression, and .5-1% faster

r62
* analyse: we can do 4x4 Horizontal Up mode when LEFT is avaible.
r61
improved 2pass ratecontrol:
ensures that I-frames have comparable quantizer to the following P-frames,
and produces more consistent quality in areas of fluctuating complexity.

r60
more informative error message when 2pass fails to converge

r59
#include <stdarg.h>

r58
cleanup spacing of frame stats with verbose logging.

r57
typo in x264_cabac_mb_sub_b_partition
(see ITU-T H.264 clause 9.3.3.1.2)

r56
Typo

r55
+ No need to emulate memalign on OS X
+ Fixed Makefile for OS X
(Original patch by Peter Handel)

r54
Conditionally inits 1pass rc, only if it's enabled.
This prevents a couple of irrelevant warnings from appearing in
constant QP mode. (Loren Merritt <lorenm at u dot washington dot edu>)

r53
Oops, changing those types messed up some vprintf's. fixed.
(Loren Merrit <lorenm at u dot washington dot edu>)

r52
filesize (bits) in a 32 bit int will overflow after 250MB, screwing up
2pass ratecontrol.
(patch by Loren Merritt <lorenm at u dot washington dot edu>)

r51
fix compilation on FreeBSD (from Loren Merritt (thanks to Igla))

r50
* ratecontrol: Patch by Loren Merritt :
" This patch
* calculates average QP as a float, providing slightly improved
ratecontrol if the first pass was CBR.
* fixes the reported QP if you set both b_stat_read and b_stat_write,
allowing 3 pass encoding (or just examination of the 2nd pass's stats)."

r49
* all: Patch by Loren Merritt.
" This patch makes scene-cut detection based on the relative cost of I-frame
vs P-frame, rather than just on the number of I-blocks used.
It also makes the scene-cut threshold configurable.
This doesn't have a very large effect: Most scene cuts are obvious to
either algorithm. But I think this way is better in some less clear cut
cases, and sometimes finds a better spot for an I-frame than just waiting
for the max I-frame interval."

r48
* ratecontrol: added 'b' flag to fopen.

r47
* all: Patches by Loren Merritt:
"Improved patch. Now supports subpel ME on all candidate MB types,
not just on the winner.
subpel_refine: (completely different scale from before)
0 => halfpel only
1 => 1 iteration of qpel on the winner (same as x264 r46)
2 => 2 iterations of qpel (about the same as my earlier patch, but faster
3 => halfpel on all MB types, qpel on the winner
4 => qpel on all
5 => more iterations
benchmarks:
mencoder dvd://1 -ovc x264 -x264encopts
qp_constant=19:fullinter:cabac:iframe=200:psnr
subpel_refine=1:PSNR Global:46.82 kb/s:1048.1 fps:17.335
subpel_refine=2:PSNR Global:46.83 kb/s:1034.4 fps:16.970
subpel_refine=3:PSNR Global:46.84 kb/s:1023.3 fps:14.770
subpel_refine=4:PSNR Global:46.87 kb/s:1010.8 fps:11.598
subpel_refine=5:PSNR Global:46.88 kb/s:1006.9 fps:10.824"
And
"The current code for calculating the cost of encoding which reference
frame a MB is predicted from, introduces a bias towards ref0 and
against P16x16.
Removing this bias produces an improvement of .4% - 2% bitrate,
depending on content and number of reference frames."

r46
* x264: added --ipratio --pbratio in help section.

r45
* ratecontrol: path by Loren Merritt.
"Use average qp instead of last qp in the frame for 2pass rc.
(Improves quality and rate accuracy if the first pass was cbr.)"

r44
* x264: added --quiet and --no-psnr.

r43
* eval.c: lalala ;)

r42
* added Loren Merritt.

r41
* all: added eval.c (I hope libx264.dsp is correct, I can't test).

r40
* all: 2pass patch by Loren Merritt <lorenm AT u.washington DOT edu>
"Mostly borrowed from libavcodec.
There is not much theoretical basis behind my choice of defaults for
rc_eq, qcompress, qblur, and ip_factor."

r39
* all: first part of the 2pass patch by Loren Merritt
(only the header/textures bits computed for now).

r38
* all: include stdarg.h (needed for x264_log)

r37
Use x264_log() in ratecontrol.c

r36
* encoder/encoder.c: oops. (fixed compilation).

r35
* all: more fprintf -> x264_log.

r34
* all: added a x264_param_t.analyse.b_psnr

r33
* encoder/encoder.c: kb/s with k=1000 (more consistant). Patch by Loren
Merritt <lorenm AT u DOT washington DOT edu>

r32
* all: introduced a x264_log function. It's not yet used everywhere
but we should start using it :)

r31
OS X is missing exp2f()

r30
r29
Add my svn user name.

r28
Bugfix.

r27
Include timing info in VUI.
Change frame rate from float to fraction (sorry for the inconvenience).

r26
Add TAGS rule.

r25
Fixes by Loren Merritt (lorenm at u.washington.edu).

r24
Get rid of integer overflows that caused the rate control to go
haywire in some situations.

r23
* encoder: correct range for i_idr_pic_id is 0..65535
(Not 0..65534)

r22
ratecontrol: patch by Loren Merritt <lorenm AT u DOT washington DOT edu>
"The new cbr mode fails to completely disable itself when encoding in
constant QP mode. The per-block QPs are then randomized between QP+4 and
QP-2 based on uninitialized ratecontrol parameters."

r21
* ratecontrol: patch by MÄns RullgÄrd <mru AT mru DOT ath DOT cx>
"This patch fixes a small bug (divide by 0 possible) in the rate control."

r20
* encoder: simpler scene cut detection (seems better but do not check
size anymore, so need more testing).

r19
* all: Change the way PSNR is computed (based on a patch by Loren
Merritt <lorenmn AT u DOT washington DOT edu>
Using SQE(DeltaSourceReconstructed) = Sum( delta^2 )
PSNR( SQE, Size ) = -10Ln(SQE / 255^2 / Size )/Ln(10) )
Y+U+V : Union of YUV planes.
Now there is
- Mean PSNR : Sum( PSNR( SQE(Y/U/V), Size(Y/U/V) ) / TotalFrames
- Average PSNR: Sum( PSNR( SQE(Y+U+V), Size(Y+U+V) ) ) / TotalFrames
- Global PSNR: PSNR( Sum( SQE(Y+U+V) ), Size(Y+U+V)*TotalFrames )
Mean PSNR is used by the JM, and Average/Overall is used on Doom9 for
example.

r18
* x264.h: increased X264_BUILD.

r17
* all: Patch from MÄns RullgÄrd <mru AT mru DOT ath DOT cx>
"Here's a patch that adds some kind of rate control.I suppose it is
by no means perfect, but it's much better than constant quantizer.It
also has a very crude scene change detection that sometimes avoids a
buffer underflow by reencoding oversized P/B frames as I frames."

r16
Linux PPC AltiVec fix

r15
BeOS fixes (no stdint.h, no libm)

r14
Attempt to fix build on Linux PPC

r13
* encoder.c, analyse.c, macroblock: fixed when using a qp per MB.
(Buggy for pskip and mb with null cbp luma and chroma).
* dct*: fixed order of idct.

r12
* cpu.asm: mmh trashing ebp,esi and edi isn't a good idea I fear ;)

r11
* all: fixed ss2 runtime selection.

r10
update & SSE2 support

r9
update

r8
remove some unused code

r7
support for build checkasm.exe

r6
* build fix (thx xxcd).

r5
* TODO: test.

r4
* vfw/* : oops...

r3
* mc-c.c compilation fix for gcc >= 3.3

r2
re-import of the CVS.

Hide changelog



Alternative to x264 Encoder:
TMPGEnc Video Mastering Works
TotalCode Studio



Guides and How to's:
Creating / Converting Blu-rays with x264 (command line options) - Read
x264 Encoding Options Explained - Read
X264 Settings explained - Read
View all guides with guide description here


Acronyms / Also Known As:
x264 cli, x264cli

Comments

Post comment
16 comments, Showing 1 to 16 comments
 - 

Superb!!!


Posted November 30, 2013 by . Tool version 0.130.2273 using OS WinXP
Ease of use 9 of 10 Functionality 8 of 10 Value for money 10 of 10 Overall score 9 of 10


Brilliant software.

I use this commandline tool as part of a conversion sequence to turn TV captures into smaller .mp4s for later playback on a "WDTV Live" box.

The X264 commandline can be daunting to figure out initially (examples abound though, just search) but once you have a useful commandline then Bob's your Uncle. eg "my" commandline creates h264 video which is proven fully compatible with the WDTV Live in terms of the "video technical compliance stuff". Happy days.

X264, when combined with FFMPEG to convert audio and with MP4box to mux the video/audio into an .mp4, provides you with capability to create your own (repeatable) custom tailored encodes.

Fantastic.



Posted July 30, 2011 by . Tool version 2044 using OS Windows 7 64-bit
Ease of use 4 of 10 Functionality 10 of 10 Value for money 10 of 10 Overall score 9 of 10


Extreme compression might be a very good feature for Sharing in-contra to my previous comment. Still figuring-out quality settings for personal back-up.

Other ripping tools like Xvid4PSP, StaxRip, RipBot264, FairUse Wizard, MEGUI must be updated to this version accordingly.



Posted September 03, 2010 by . Tool version Version:r1703 using OS Linux
Ease of use 9 of 10 Functionality 9 of 10 Value for money 9 of 10 Overall score 9 of 10


v r1703 better compression, but, video loses overall sharpness.
it's disappointing.
Target Video Bit rate : 1 500 Kbps
Actual Video Bit rate : 817 Kpbs (Too Low than target results-in poor Quality)


Hope for better improvement in next release.



Posted August 26, 2010 by . Tool version r1703 using OS Windows 7
Ease of use 9 of 10 Functionality 9 of 10 Value for money 9 of 10 Overall score 9 of 10


Simply the best implementation of H.264 spec.

It is a CLI tool so some patience is required to learn it, otherwise use some great GUI's like
Ripbot, StaxRip, or MeGUI.



Posted July 13, 2010 by . Tool version 1666 using OS Windows 7
Ease of use 5 of 10 Functionality 10 of 10 Value for money 10 of 10 Overall score 10 of 10


The BEST,
and my most favorite video encoder.

Thanks for continuous updates.

Note:-
======
x264 vfw requires same trends for updates too!



Posted May 30, 2010 by . Tool version r1613 using OS WinXP
Ease of use 9 of 10 Functionality 9 of 10 Value for money 9 of 10 Overall score 9 of 10


Simply The Best H264 encoder available, no doubt.
Thanks to authors for keeping FREE,
and running a good show of updates.



Posted May 17, 2010 by . Tool version 1592 using OS WinXP
Ease of use 9 of 10 Functionality 9 of 10 Value for money 9 of 10 Overall score 9 of 10


By far the best H.264 encoder I've ever experienced. It even dwarfed all these commercial products and it's getting better!



Posted May 01, 2010 by . Tool version 1570 using OS WinXP
Ease of use 5 of 10 Functionality 10 of 10 Value for money 10 of 10 Overall score 10 of 10


Unless you device does not support h264 part 10 (AVC) then there is NO REASON why you should not be using x264, even if you are not a console God , there is plenty of GUI's that harness the power of this codec implmentation.


Posted November 30, 2009 by . Tool version 1354 using OS Windows 7 64-bit
Ease of use 10 of 10 Functionality 10 of 10 Value for money 10 of 10 Overall score 10 of 10


I have been an AVI with XviD and MP3 die hard fan for a LONG time! I just recently graduated to using MKV files and building my own chapters. THEN, I discovered that I can encode H264/X264 files and *directly* mux the AC3 audio from a DVD rip into an MKV.

What I did NOT expect was the quality of video as such low bit rates.

I very extensively use my Western Digital WD TV to play the videos I make on. When using XivD to encode 720 videos (1280X720), I *must* run at least 5000kbs to have a decent picture. With H264 (or better yet, x264.exe) the video quality is superior, at only 2500kbs!!!

Now I wish my Creative Zen would support 264, because it is leaps and bounds better than WMV9!!

I love this CLI tool, thanks so much to the author(s)!!! I hope to have a GUI built soon, and have plans on making a GUI tool kit for MKVs! Thanks!



Posted February 02, 2009 by . Tool version r1097 using OS WinXP
Ease of use 10 of 10 Functionality 10 of 10 Value for money 10 of 10 Overall score 10 of 10


With this codec,you have DVD-like picture quality on VCD bitrates!
IMO, the future is here, in this codec!

I use it with super encoder to batch convert dvb mpeg2 files of various framesizes. The speed is 1/4 realtime on my core 2 duo 6600.

The vfm version is faster (about 2/5 realtime). You can use it with virtualdub.

An excellent choice for those who like cutting edge solutions, or something with great future in front of it!



Posted April 25, 2007 by . Tool version cor54 rev600 using OS WinXP
Ease of use 10 of 10 Functionality 10 of 10 Value for money 10 of 10 Overall score 10 of 10


X264 is the best codec I ever used. Thanks to DeathToSheep for the unofficial VFW version I can stay using it with virtualdub.
I capture with Mainconcept PVR in MPEG2 (quality 32) and convert with VirtualdubMPG to AVI files (X264 -single pass bitrate 800)
With this combination of videotools I can put 13 episodes (50minutes/episode) of my favorite "Aspe murders" soap on 1 DVD and the quality is much,much better than VHS.



Posted December 05, 2006 by . Tool version 6.00 using OS WinXP
Ease of use 8 of 10 Functionality 10 of 10 Value for money 10 of 10 Overall score 9 of 10


After being skeptical about AVC H.264 I finally broke down and decided to try it for some iPod movies. The source files were MPEG-1 @ 1856kbps ripped from some high bitrate xVCD's I did years ago, I tried doing these with 3ivX and DivX 6.2.5 and wasn't pleased with the results especially on HDTV, I tried x264 using MeGUI and I am blown away by the 2-pass quality @ 700kbps. Even at a low resolution of 352x144 these movies look good (not great)on a 42" HDTV and the file sizes are quite small. MeGUI is a very powerful program but not exactly for noobs, When the ability for 640x480 iPod resolutions becomes possible this codec will be unstoppable!!


Posted September 24, 2006 by . Tool version 565 using OS WinXP
Ease of use 7 of 10 Functionality 9 of 10 Value for money 10 of 10 Overall score 9 of 10


What's xvid?? what's divx?? what's wmv??

No way X264 the best codec ever.

High quality in low bitrate.

This awesome codec use for me for months.No rival.

Cauptain



Posted January 27, 2006 by . Tool version 409 using OS WinXP
Ease of use 10 of 10 Functionality 10 of 10 Value for money 10 of 10 Overall score 10 of 10


Better quality than XVID and a smaller file size. Use the latest FFDshow from Celtic_Druid for playback. Be aware that playback is CPU intensive - Not designed for < 2.0Ghz machines (yet).


Posted June 15, 2005 by . Tool version 263 using OS WinXP
Ease of use 8 of 10 Functionality 9 of 10 Value for money 10 of 10 Overall score 9 of 10


works now correctly in sony vegas - testing quality against other H264, but so far - this is a winner ..




Posted June 14, 2005 by . Tool version Revision 261 using OS WinXP
Ease of use 8 of 10 Functionality 10 of 10 Value for money 10 of 10 Overall score 9 of 10

16 comments, Showing 1 to 16 comments
 - 

1 tool hits, Showing 1 to 1 tools
 - 
Explanation:
NEW SOFTWARE= New tool since your last visit
NEW VERSION= New version since your last visit
NEW COMMENT= New comment since your last visit

Type and download
NO MORE UPDATES? = The software hasn't been updated in over 2 years.
NO LONGER DEVELOPED = The software hasn't been updated in over 3 years.
RECENTLY UPDATED = The software has been updated the last 31 days.
Freeware = Free software.
Free software = Free software and also open source code.
Freeware/Adware = Free software but supported by advertising, usually with a included browser toolbar. It may be disabled when installing or after installation.
Free software/Adware = Free software and open source code but supported by advertising, usually with a included browser toolbar. It may be disabled when installing or after installation.
Trialware = Also called shareware or demo. Trial version available for download and testing with usually a time limit or limited functions.
Payware = No demo or trial available.
Portable version = A portable/standalone version is available. No installation is required.
v1.0.1 = Latest version available.
Download beta = It could be a BETA, RC(Release Candidate) and even a ALPHA version of the software.
Download (direct link) = A direct link to the software download.
Download (developer's site) = A link to the software developer site.
Download (mirror link) = A mirror link to the software download. It may not contain the latest versions.
Download old versions = Free downloads of previous versions of the program.
Download 64-bit version = If you have a 64bit operating system you can download this version.
Download portable version = Portable/Standalone version meaning that no installation is required, just extract the files to a folder and run directly.
Windows = Windows version available.
Mac OS = Mac OS version available.
Linux = Linux version available.
Our hosted tools are virus and malware scanned with several antivirus programs using www.virustotal.com.

Rating
Rating from 0-10.

Browse software by sections
All In One Blu-ray Converters (10)
All In One DVD Converters (17)
All In One MKV to MP4/Blu-ray (10)
All In One Video Converters (28)
Animation (3D & 2D animation) (6)
Audio Editors (17)
Audio Encoders (64)
Audio Players (7)
Authoring (Blu-ray/AVCHD) (20)
Authoring (DivX) (5)
Authoring (DVD) (34)
Authoring (SVCD/VCD) (11)
Bitrate Calculators (7)
Blu-ray to AVI/MKV/MP4 (13)
Blu-ray to Blu-ray/AVCHD (9)
Burn (CD,DVD,Blu-ray) (22)
Camcorders/DV/HDV/AVCHD (34)
Capture (33)
CD/DVD recovery (4)
Codec Packs (7)
Codec/Video Identifiers (33)
Codecs (63)
Decrypters (Blu-ray Rippers) (7)
Decrypters (DVD Rippers) (15)
DigitalTV/DVB/HDTV (36)
DVD to AVI/DivX/XviD (19)
DVD to DVD (22)
DVD to MP4/MKV/H264 (19)
DVD to VCD/SVCD (5)
ISO/Image (15)
Linux video tools (95)
MacOS video tools (146)
Media (Blu-ray/DVD/CD) (9)
Media Center/HTPC/PS3/360 (39)
Other useful tools (101)
Photo Blu-ray/DVD/VCD (9)
Portable/Mobile/PSP/iPod (40)
Region free tools (5)
Screen capture/Screenshots (22)
Subtitle (52)
Video De/Multiplexers (51)
Video Editors (Advanced/NLE) (45)
Video Editors (Basic) (45)
Video Editors (H264/MP4/MKV/MTS) (17)
Video Editors (MPG/DVD) (20)
Video Editors (WMV/AVI) (15)
Video Encoders (AVI/WMV) (39)
Video Encoders (H264/H265/MP4/MKV) (42)
Video Encoders (MPG/DVD) (26)
Video Encoders / Converters (145)
Video Frameservers (7)
Video Players (29)
Video Repair/Fix (18)
Video Streaming (23)
Video Streaming Recording (52)
Virtualdub tools (10)
Search   Contact us   About   Advertise   Forum   RSS Feeds   Statistics   Tools   

Site layout: Default Classic Blue

Affiliates: codecs.com   VSO Software

©1999-2014 videohelp.com