| Search tool or Browse all tools by sections |
| Tool | Description | Type | Rating | Comment | |||||||||||
x264 Encoder |
x264 is an open source H264/AVC based video encoder. The x264 CLI is a command line x264 encoder tool and is used in several converters like Handbrake, Xvid4PSP, StaxRip, RipBot264, FairUse Wizard, MEGUI.
|
Free software Released: Size:9.8MB |
9.2/10
15 votes Guides Similar tools |
Read 15 comments 5924 views this month 33680870 total views |
|||||||||||
|
Latest version: r2200 (May 23, 2012) Download sites: Visit developer's site More download options: Download 64 bit version (direct link) Supported operating systems: More information and other downloads: Download Komisar's unoffical x264 VFW Codec here or another an unoffical x264 VFW Codec here, use x264 in for example Virtualdub or other that supports Video For Windows(VFW) Codecs. Both encoding and decoding. x264 Encoder GUIs/Frontends: Handbrake, Xvid4PSP, StaxRip, RipBot264, FairUse Wizard, MEGUI, AutoX264, HDConvertToX. Sections/Browse similar tools: Video Encoders (H264/MP4/MKV) |
Click to enlarge screenshot |
||||||||||||||
User options: Email me when it has been updated Report this tool (dead link/new version) Version history:
r2200 Hide changelog Threaded lookahead Split each lookahead frame analysis call into multiple threads. Has a small impact on quality, but does not seem to be consistently any worse. This helps alleviate bottlenecks with many cores and frame threads. In many case, this massively increases performance on many-core systems. For example, over 100% faster 1080p encoding with --preset veryfast on a 12-core i7 system. Realtime 1080p30 at --preset slow should now be feasible on real systems. For sliced-threads, this patch should be faster regardless of settings (~10%). By default, lookahead threads are 1/6 of regular threads. This isn't exacting, but it seems to work well for all presets on real systems. With sliced-threads, it's the same as the number of encoding threads. r2199 Add support for RGB formats in bit-depth conversion filter r2198 Fix some bugs in mb_info code r2197 Add mb_info API for signalling constant macroblocks Some use-cases of x264 involve encoding video with large constant areas of the frame. Sometimes, the caller knows which areas these are, and can tell x264. This API lets the caller do this and adds internal tracking of modifications to macroblocks to avoid problems. This is really only suitable without B-frames. An example use-case would be using x264 for VNC. r2196 Faster chroma weight cost calculation New assembly function with SSE2, SSSE3 and XOP implementations for calculating absolute sum of differences. r2195 Add Level 5.2 support r2194 Eradicate all mention of Extended Profile x264 never supported it and never will because nobody uses it. r2193 Fix disabling of mbtree when using 2pass encoding and zones r2192 configure: force select -mXX gcc option for i386/x86-64 Makes multilib compilation more convenient. r2191 Update config.guess and config.sub Adds support for a bunch of targets, including: aarch64 (armv8) arm-linux-androideabi r2190 configure: correct use of RC variable and add --extra-rcflags r2189 ICL/MSVS: Fix shared library generation and usage MSVS requires exported variables to be declared with the DATA keyword, and requires that imported variables be declared with dllimport. This does not fix x264 cli being unable to use a shared library built by ICL however. r2188 Fix intra-refresh + hrd r2187 Fix frame input colorspace check r2186 Fix comment in deblock.c The code does, in fact, handle CAVLC+8x8dct correctly already. r2185 Fix sliced-threads ratecontrol bug Was using qp instead of qscale; could cause NANs (not to mention less accurate results). r2184 Fix clobbering of mutex/cvs Regression in r2183. Bizarrely seemed to work on many platforms, but crashed on win64 and may have been slower. Only affected sliced threads during encoding, but could cause crashes on x264 encoder close even without sliced threads. r2183 Sliced-threads: do hpel and deblock after returning Lowers encoding latency around 14% in sliced threads mode with preset superfast. Additionally, even if there is no waiting time between frames, this improves parallelism, because hpel+deblock are done during the (singlethreaded) lookahead. For ease of debugging, dump-yuv forces all of the threads to wait and finish instead of setting b_full_recon. r2182 Add full-recon API option Fully reconstruct frames even without dump-yuv. r2181 x86inc: switch to amdnops Recent AMD CPUs' instruction decoders choke horribly on extremely long nops (i.e. with 4 prefixes). Won't affect much, since we don't use ALIGN much. r2180 BMI1 decimate functions Intel was nice enough to make tzcnt equal to "rep bsf", which is backwards-compatible. This means we don't actually have to add new functions to make it work. r2179 Minor asm changes r2178 Add row-reencoding support to VBV for improved accuracy Extremely accurate, possibly 100% so (I can't get it to fail even with difficult VBVs). Does not yet support rows split on slice boundaries (occurs often with slice-max-size/mbs). Still inaccurate with sliced threads, but better than before. r2177 Abstract bitstream backup/restore functions Required for row re-encoding. r2176 Add an small per-MB cost penalty for lowres Helps avoid VBV predictors going nuts with very low-cost MBs. One particular case this fixes is zero-cost MBs: adaptive quantization decreases the QP a lot, but (before this patch), no cost penalty gets factored in for this, because anything times zero is zero. r2175 Remove explicit run calculation from coeff_level_run Not necessary with the CAVLC lookup table for zero run codes. r2174 Export PSNR/SSIM in x264 API r2173 x86inc: support yasm -f win64 Not necessary for x264, as -m amd64 already does the right thing, but used by external users of x86inc. r2172 Fix incorrect zero-extension assumptions in x86_64 asm Some x264 asm assumed that the high 32 bits of registers containing "int" values would be zero. This is almost always the case, and it seems to work with gcc, but it is *not* guaranteed by the ABI. As a result, it breaks with some other compilers, like Clang, that take advantage of this in optimizations. Accordingly, fix all x86 code by using intptr_t instead of int or using movsxd where neccessary. Also add checkasm hack to detect when assembly functions incorrectly assumes that 32-bit integers are zero-extended to 64-bit. r2171 Fix possible alignment crash when linking from MSVC x264_cavlc_init needs to be stack-aligned now. r2170 Fix rare overflow in 10-bit intra_satd_x3_16x16 asm r2169 ICL: fix out of tree building and resource file usage on Windows r2168 Add error handling for out-of-tree build r2167 Fix RGB colorspace input BGR/BGRA input was correct. r2166 Fix interlaced + extremal slice-max-size Broke if the first macroblock in the slice exceeded the set slice-max-size. r2165 Fix regression in r2141 Broke register preservation in x264_cpu_cpuid and x264_cpu_xgetbv. Did not cause any problems. r2164 TBM, AVX2, FMA3, BMI1, and BMI2 CPU detection support TBM and BMI1 are supported by Trinity/Piledriver. The others (and BMI1) will probably appear in Intel's upcoming Haswell. Also update x86inc with AVX2 stuff. r2163 x86inc: add TAIL_CALL macro to abstract a common asm idiom r2162 Minor asm optimizations/cleanup r2161 Clean up and optimize weightp, plus enable SSSE3 weight on SB/BDZ Also remove unused AVX cruft. r2160 XOP frame_init_lowres Covers both 8-bit and 16-bit, ~5-10% faster on Bulldozer. r2159 XOP 8x8 zigzags Field: 35(mmx) ->16(xop) cycles Frame: 32(ssse3)->20(xop) cycles r2158 AVX 32-bit hpel_filter_h Faster on Sandy Bridge. Also add details on unsuccessful optimizations in these functions. r2157 x86inc: add high halfword register support Might be useful in a few cases. r2156 Change %ifdef directives to %if directives in *.asm files This allows combining multiple conditionals in a single statement. r2155 Use TV range algorithm for bit-depth conversions Such sources are more common, so better to be correct for the common case. This also produces less error for the case of full range than the previous algorithm produced for the case of TV range. r2154 Bump dates to 2012 r2153 Add Windows resource file Displays version info in Windows Explorer. r2152 Fix win32 pthread_cond_signal Isn't used by x264 currently, so didn't cause a problem. Fix backported from libav. r2151 ARM: align asm functions to 4 bytes. Some linkers apparently fail to correctly align ARM functions when mixing with Thumb code. r2150 Fix normalization of colorspace when input is packed YUV 4:2:2 r2149 Force keyint-min 1 with Blu-ray Fixes an issue with referencing across I-frames that's prohibited in Blu-ray for some godforsaken reason. r2148 Fix crash in --demuxer y4m with unsupported colorspace r2147 Fix overread/possible crash with intra refresh + VBV r2146 Fix trellis 2 + subme >= 8 Trellis didn't return a boolean value as it was supposed to. Regression in r2143-5. r2145 CABAC trellis opts part 4: x86_64 asm Another 20% faster. 18k->12k codesize. This patch series may have a large impact on encoding speed. For example, 24% faster at --preset slower --crf 23 with 720p parkjoy. Overall speed increase is proportional to the cost of trellis (which is proportional to bitrate, and much more with --trellis 2). r2144 CABAC trellis opts part 3: make some arrays non-static r2143 CABAC trellis opts part 2: C optimizations Hoist the branch on coef value out of the loop over node contexts. Special cases for each possible coef value (0,1,n). Special case for dc-only blocks. Template the main loop for two common subsets of nodes, to avoid a bunch of branches about which nodes are live. Use the nonupdating version of cabac_size_decision in more cases, and omit those bins from the node struct. CABAC offsets are now compile-time constants. Change TRELLIS_SCORE_MAX from a specific constant to anything negative, which is cheaper to test. Remove dct_weight2_zigzag[], since trellis has to lookup zigzag[] anyway. 60% faster on x86_64. 25k->18k codesize. r2142 CABAC trellis opts part 1: minor change in output Due to different tie-break order. r2141 x86inc improvements for 64-bit Add support for all x86-64 registers Prefer caller-saved register over callee-saved on WIN64 Support up to 15 function arguments r2140 High bit depth SSE2/AVX add8x8_idct8 and add16x16_idct8 From Google Code-In. r2139 MMX/SSE2/AVX predict_8x16_p, high bit depth fdct8 From Google Code-In. r2138 XOP 8-bit fDCT Use integer MAC for one of the SUMSUB passes. About a dozen cycles faster for 16x16. r2137 High bit depth intra_sad_x3_4x4 From Google Code-In. r2136 Use a large LUT for CAVLC zero-run bit codes Helps the most with trellis and RD, but also helps with bitstream writing. Seems at worst neutral even in the extreme case of a CPU with small L2 cache (e.g. ARM Cortex A8). r2135 High bit depth intra_sad_x3_8x8, intra_satd_x3_4x4/8x8c/16x16 Also add an ACCUM macro to handle accumulator-induced add-or-swap more concisely. r2134 MMX 10-bit predict_8x8c_h and predict_8x16c_h From Google Code-In. r2133 Some MBAFF x86 assembly functions. deblock_chroma_420_mbaff, plus 422/422_intra_mbaff implemented using existing functions. From Google Code-In. r2132 More ARM NEON assembly functions predict_8x8_v, predict_4x4_dc_top, predict_8x8_ddl, predict_8x8_ddr, predict_8x8_vl, predict_8x8_vr, predict_8x8_hd, predict_8x8_hu. From Google Code-In. r2131 More 4:2:2 asm functions High bit depth version of deblock_h_chroma_422. Regular and high bit depth versions of deblock_h_chroma_intra_422. High bit depth pixel_vsad. SSE2 high bit depth and MMX 8-bit predict_8x8_vl. Our first GCI patch this year! r2130 SSE2 and SSSE3 versions of sub8x16_dct_dc Also slightly faster sub8x8_dct_dc r2129 Resize filter updates Use AVPixFmtDescriptors to pick the most compatible x264 csp for any pixel format. Fix deprecated use of av_set_int. Now requires libavutil >= 51.19.0 r2128 Add out-of-tree build support r2127 Limit SSIM to 100db Avoids floating point error for infinite SSIM (lossless). r2126 Fix wrong conditional inclusion of inttypes.h inttypes.h is required by encoder/ratecontrol.c for SCNxxx macros, and HAVE_STDINT_H does not imply having inttypes.h. stdint.h is a subset of inttypes.h, but this isn't enough for x264. This change fixes building x264 with Android's toolchain. r2125 Fix crash with sliced threads and input height <= 112 r2124 Fix loading custom 8x8 chroma quant matrices in 4:4:4 r2123 Fix PCM cost overflow r2122 Fix overflow in 8-bit x86 vsad asm function r2121 Fix crash in --fullhelp when compiled against recent ffmpeg Don't assume all pixel formats have a description. r2120 Fix regression in r2118 Broke trellis with i16x16 macroblocks. r2119 Modify MBAFF chroma deblock functions to handle U/V at the same time Allows for more convenient asm implementations. r2118 CABAC trellis optimizations: use SIMD quant Significant speed increase, minor change in output due to rounding. r2117 YUV range detection and support for x264CLI Two new options: --input-range and --range. --input-range forces the range of the input in case of misdetection; auto by default. -- range sets the range of the output; x264cli will convert if necessary, TV by default. --fullrange is now removed as a CLI option (but the libx264 API is unchanged). r2116 Pass through user data r2115 Remove unpredictable branch in CABAC dqp r2114 x86inc: AVX symmetry optimization 3-arg AVX ops with a memory arg can only have it in src2, whereas SSE emulation of 3-arg prefers to have it in src1 (i.e. the move). So, if the op is symmetric and the wrong one is memory, swap them. Eliminates redundant moves in some cases when using 3-operand without AVX with memory arguments. Also fix movss and movsd in some cases, and flag shufps correctly as float. r2113 checkasm: shut up gcc warnings, fix some naming of functions in results r2112 checkasm: fix build on ARM Because of how ALIGNED_ARRAY_16 is defined on ARM, array initialisers cannot be used here. Use memset() instead. r2111 Improve makefile rules Remove the need for "make clean" after most reconfigures. r2110 Mark some local functions as static, cosmetics r2109 Fix crash if timecode file opening fails r2108 Configure: force PIC for shared build on PARISC and MIPS r2107 Improve yasm version check Previous check allowed certain earlier versions that weren't fully compatible. r2106 Add fenc prefetching to adaptive quant Many fewer cache misses, faster adaptive quant. r2105 Split prefetch_fenc between colorspaces Add 4:2:2 version. r2104 Some more 4:2:2 x86 asm coeff_last8, coeff_level_run8, var2_8x16, predict_8x16c_dc, satd_4x16, intra_mbcmp_8x16c_x3, deblock_h_chroma_422 r2103 Remove obsolete versions of intra_mbcmp_x3 intra_mbcmp_x3 is unnecessary if x9 exists (SSSE3 and onwards). r2102 SSSE3/SSE4/AVX 9-way fully merged i8x8 analysis (sa8d_x9) x86_64 only for now, due to register requirements (like sa8d_x3). i8x8 analysis cycles (per partition): penryn sandybridge bulldozer 616->600 482->374 418->356 preset=faster 892->632 725->387 598->373 preset=medium 948->650 789->409 673->383 preset=slower r2101 SSSE3/SSE4/AVX 9-way fully merged i8x8 analysis (sad_x9) ~3 times faster than current analysis, plus (like intra_sad_x9_4x4) analyzes all modes without shortcuts. r2100 Merge i4x4 prediction with intra_mbcmp_x9_4x4 Avoids a redundant prediction after analysis. r2099 Inline i4x4/i8x8 encode into intra analysis Larger code size, but faster. r2098 Initial XOP and FMA4 support on AMD Bulldozer ~10% faster Hadamard functions (SATD/SA8D/hadamard_ac) plus other improvements. r2097 ARM: update NEON chroma deblock functions to NV12 pixel format r2096 Add /usr/lib/{64/}values-xpg6.o to $LDFLAGS on Solaris This is required for POSIX.1-2001 compliance. r2095 Fix linker test for -Bsymbolic The Solaris linker only accepts -Bsymbolic for objects compiled in dynamic mode (i.e. shared objects), so pass -shared to gcc. Additionally, for x86_32 unresolved textrels cause a linker error so mark the .text section as 'impure'. r2094 Add $SOFLAGS to exported SOFLAGS make variable r2093 Allow setting a chroma format at compile time Gives a slight speed increase and significant binary size reduction when only one chroma format is needed. r2092 Improve profile help List high422/high444 profiles, and don't show non-high-bit-depth profiles in high bit depth builds. r2091 Fix infinite loop parsing TDecimate Mode 3 timecode v1 files r2090 Fix some integer overflows/signedness errors found by IOC The only real bug here is in slicetype.c, which may or may not affect real encodes. r2089 Fix pixel_var2 with 4:2:2 encoding Might have caused artifacts or suboptimal chroma compression. r2088 Fix chroma intra analysis in 4:4:4 lossless mode r2087 Fix use of uninitialized MVs in sub8x8 RDO r2086 Fix detection of Alpha CPU arch on alphaev67 r2085 Optimize x86 asm for Intel macro-op fusion That is, place all loop counter tests right before their conditional jumps. r2084 CAVLC: clean up and restructure Somewhat faster CAVLC and RD bit-counting. r2083 CABAC: clean up and restructure Somewhat faster CABAC and RD bit-counting. r2082 Some initial 4:2:2 x86 asm r2081 4:2:2 encoding support r2080 SSSE3/SSE4 9-way fully merged i4x4 analysis (sad/satd_x9) i4x4 analysis cycles (per partition): penryn sandybridge 184-> 75 157-> 54 preset=superfast (sad) 281->165 225->124 preset=faster (satd with early termination) 332->165 263->124 preset=medium 379->165 297->124 preset=slower (satd without early termination) This is the first code in x264 that intentionally produces different behavior on different cpus: satd_x9 is implemented only on ssse3+ and checks all intra directions, whereas the old code (on fast presets) may early terminate after checking only some of them. There is no systematic difference on slow presets, though they still occasionally disagree about tiebreaks. For ease of debugging, add an option "--cpu-independent" to disable satd_x9 and any analogous future code. r2079 Faster intra_mbcmp_x3 for versions without dedicated asm Select asm subroutines more intelligently in the wrapper functions. r2078 Optimize x86 intra_predict_4x4 and 8x8 High bit depth Penryn, Sandybridge cycles: 4x4_ddl: 11->10, 9-> 8 4x4_ddr: 15->13, 12->11 4x4_hd: , 15->12 4x4_hu: , 14->13 4x4_vr: 15->14, 14->12 8x8_ddl: 32->19, 19->14 8x8_ddr: 42->19, 21->14 8x8_hd: , 15->13 8x8_hu: 21->17, 16->12 8x8_vr: 33->19, 8-bit Penryn, Sandybridge cycles: 4x4_ddr: 24->15, 4x4_hd: 24->16, 4x4_hu: 23->15, 4x4_vr: 23->16, 4x4_vl: 10-> 9, 8x8_ddl: 23->15, 8x8_hd: , 17->14 8x8_hu: , 15->14 8x8_vr: 20->16, 17->13 r2077 Use realistic alignment for intra pred benchmarks in checkasm r2076 Fix frame packing SEI with --frame-packing 0 According to the spec, when frame_packing_arrangement_type is equal to 0, quincunx_sampling_flag shall be equal to 1. r2075 Fix install/uninstall shared libs if SYS is WINDOWS/CYGWIN r2074 Add Hurd support to configure r2073 Optimize x86 intra_satd_x3_* ~7% faster. r2072 Optimize x86 intra_sa8d_x3_8x8 ~40% faster. Also some other minor asm cosmetics. r2071 Scale interlaced refs/mvs for mvr predictors Slightly improves compression and fixes a Valgrind error. r2070 Optimize predict_8x8_filter and incidentally remove a valgrind false-positive r2069 Don't override flat SSE2 dequant functions with non-flat AVX ones Slightly faster. r2068 Shut up some valgrind false-positives r2067 Avoid some unnecessary allocations with B-frames/CABAC off r2066 Fix typo in p8x8 RD analysis Passed wrong idx to trellis. r2065 Fix invalid memory accesses in x86 lowres_init when width <= 16 r2064 Fix intermediate conversion for YUVJ* pixfmts with 4:4:4 encoding r2063 Fix pic_out returned by x264_encoder_encode with 4:4:4 r2062 Fix zeroing of mvr predictors in bskip blocks r2061 Fix: chroma planes for weightp analysis were not initted if U early-terminates and V doesn't. r2060 Expand borders before chroma weightp analysis Prevents mc from using uninitialized source pixels. r2059 Another 4:4:4 chroma weightp bug fix r2058 Fix typo in help r2057 Improve support for varying resolution between passes Should give much better quality, but still doesn't support MB-tree yet. Also check for the same interlaced options between passes. Various minor ratecontrol cosmetics. r2056 asm cosmetics: base-4 constants for shuffles r2055 Enable some existing asm functions that were missing function pointers pixel_ads1_avx, predict_8x8_hd_avxx High bit depth mc_copy_w8_sse2, denoise_dct_avx, prefetch_fenc/ref, and several pixel*sse4. r2054 Remove some unused, broken, and/or useless functions Unused frame_sort. Unused x86_64 dequant_4x4dc_mmx2, predict_8x8_vr_mmx2. Unused and broken high_depth integral_init*h_sse4, optimize_chroma_*, dequant_flat_*, sub8x8_dct_dc_*, zigzag_sub_*. Useless high_depth dequant_sse4, dequant_dc_sse4. r2053 asm cosmetics: merge all the variants of ABS macros r2052 asm cosmetics part 2 were split out of the cpuflags commit because they change the output executable. r2051 asm cosmetics: INIT_MMX/XMM/YMM now support a cpuflags argument Reduces the number of macro args that need to be passed around. Allows multiple implementations of a given macro (e.g. PALIGNR) to check cpuflags at the location where the macro is defined, instead of having to select implementations by %define at toplevel. Remove INIT_AVX, as it's replaced by "INIT_XMM avx". This commit does not change the stripped executable. r2050 Import x86inc.asm patches from libav r2049 Cosmetics: s/mmxext/mmx2/ r2048 Fix two bugs in 4:4:4 chroma weightp analysis Caused slightly worse compression. r2047 Fix "--asm avx" Previously required "--asm sse2fast,fastshuffle,sse4.2,avx". r2046 Re-add support for glibc <2.6, which doesn't have CPU_COUNT r2045 Avoid using deprecated libavformat functions Replace av_find_stream_info with avformat_find_stream_info. Now requires libavformat 53.3.0 or newer. r2044 Use assembly versions of some deblocking functions in MBAFF r2043 Move X264_VERSION / X264_POINTVER from config.h to x264_config.h This makes them available to external programs as part of the public API. r2042 Fix padding bug in x264_expand_border_mbpair r2041 Timecode parsing: Add missing initialization Fix crash when failed to parse timecode file before malloc pts. Fix detection of user timebase considered to be exceeding H.264 maximum. r2040 Fix crash with high bitdepth 4:2:0 input r2039 x86 asm cosmetics Use FDEC_STRIDEB where appropriate. r2038 Fix a bug in lossless sub-8x8 RD Caused crashes in rare cases with lossless encoding. Regression in 4:4:4. r2037 Improved p8x4/4x8 search decision Use the same thresholding as for p16x8/8x16. Does p8x4/4x8 search more often, for a small compression improvement. r2036 Add --subme 11, which disables all early terminations in analysis Necessary for a future trellis mode decision/motion estimation patch. Also add the slowest presets to the regression test. r2035 Some trivial changes to RD thresholds The output-changing portion of the next patch. r2034 Allow setting a wider range of chroma QP offsets This allows use of the full range of chroma QP offsets, even in combination with the automatic psy-based adjustments. r2033 Optimize macroblock_deblock_strength, add more early terminations r2032 Function-pointerify MBAFF deblocking functions r2031 Clean up MBAFF deblocking code r2030 Optimize frame_deblock_row r2029 Shrink two arrays r2028 Add support for the new (4:4:4) colorspaces to x264_picture_alloc r2027 Various cosmetics r2026 Improve configure help r2025 Use $optarg for some configure options r2024 Linux x264_cpu_num_processors(): use glibc macros The cpu_set_t structure is considered opaque. Also handle sched_getaffinity() error case if "cpusetsize is smaller than the size of the affinity mask used by the kernel." r2023 Fix spurious "stream properties changed" with --seek option on some inputs r2022 Fix use of deprecated libavcodec functions Replace avcodec_open with avcodec_open2. Now requires libavcodec 53.6.0 or newer. r2021 Fix nalu_process callback with HRD r2020 Fix incorrect chroma swap for some input pixfmts Problem occurred if pixfmt of lavf/ffms input was PIX_FMT_RGB24 or PIX_FMT_YUV444P. r2019 Fix resize filter crash with YUVJ* input pixfmt r2018 RGB encoding support Much less efficient than YUV444, but easy to support using the YUV444 framework. r2017 4:4:4 encoding support r2016 Properly weight slice header lambda in chroma weightp analysis r2015 Better x86 high bit depth predict_8x8c_p Avoid the need to check for corner cases by reordering arithmetic. Also make a minor optimization to high bit depth predict_16x16_p. r2014 Eliminate extra layer of indirection for sps/pps references Also remove poc type 1 support (it didn't work anyways) to reduce sps size. r2013 Fix SSIM calculation with sliced threads r2012 Avoid possible NaNs in B-frame output stats r2011 ARM: do not override the toolchain default for FPU ABI r2010 Fix link errors with libswscale/libavutil as shared libraries r2009 Fix deprecation in libavformat usage Replace av_open_input_file with avformat_open_input. Now requires libavformat 53.2.0 or newer. r2008 Fix various issues with VBV+threads Eliminate the race condition with interframe row predictors and threads. Recalculate frame_size_estimated at the end of a frame, for improved update_vbv_plan. Some cosmetics. r2007 Fix MBAFF row VBV ratecontrol Reverts most of r1984 and implements a much simpler solution. r2006 Make ratecontrol_mb less slow r2005 Resize filter updates Fix use of deprecated sws_getContext. Fix uses of sws_format_name. Fix stream change warning not occurring on the first resolution change. Drop cpu detection, as it is now performed internally by swscale. Update swscale version requirements. r2004 AVX mbtree_propagate Up to ~20-30% faster than SSE2 on Sandy Bridge. r2003 Use -vsync 0 with ffmpeg regression test r2002 Inline emms instructions on x86 if possible r2001 Make left_index_table const Should allow for some missed compiler optimizations in macroblock_cache_load. r2000 Make --profile main/baseline force off CQMfile r1999 Fix VBV bug caused by zero i_row_satd value for first and last row r1998 Fix crash with VBV + forced QP r1997 Fix VBV bug with MinCR limit r1996 Fix bitstream reallocation with slice-max-size + MBAFF r1995 Improve build system capabilities Make static lib and CLI optional. Support linking CLI to system libx264. Don't strip by default, to match GNU packaging guidelines. r1994 Slightly speed up x86 CABAC asm Also make some various cleanups. r1993 Faster pixel_memset ~4x faster. Also inline plane_expand_border for improved constant propagation. r1992 Add checkasm tests for memcpy_aligned, memzero_aligned Also make memcpy_aligned support sizes smaller than 64. r1991 MBAFF: Add regularization to VSAD metric Bias towards the MBAFF decisions made in neighboring mb pairs. ~2% better compression on a random 1080i HDTV source. r1990 MBAFF: Improve handling of bottom row mod32 padding Force skip on any MBs entirely outside the frame If an mb pair in the bottom row is chosen to be progressive, re-pad the bottom rows progressively. r1989 MBAFF: Add frame/field MB stats r1988 MBAFF: Template direct spatial r1987 MBAFF: Template cache_load and cache_load_neighbours r1986 MBAFF: Make interlaced support a compile time option r1985 MBAFF: Don't call zigzag_init for every mb r1984 MBAFF: Modify ratecontrol to update every two rows r1983 MBAFF: Add support for slice-max-size Also add slice-max-size to the regression tests. r1982 MBAFF: Add support for slice-max-mbs r1981 MBAFF: Adaptive quantization Compute energy for interlaced and progressive choices and pick the least. r1980 MBAFF: Enable adaptive MBAFF with VSAD decision r1979 MBAFF: Create a VSAD DSP function x86 assembly by Jason Garrett-Glaser. This gives roughly 30x speed increase over the C version. r1978 MBAFF: Direct spatial r1977 MBAFF: Direct temporal r1976 MBAFF: Calculate bipred POCs Need to calculate two tables for the cases where the current macroblock is progressive or interlaced as refs are calculated differently for each. r1975 MBAFF: Use both left macroblocks for ref_idx calculation r1974 MBAFF: First edge deblocking r1973 MBAFF: Implement left edge deblocking functions r1972 MBAFF: Add extra data to the deblock strength structure r1971 MBAFF: Deblocking support r1970 MBAFF: Move common code from deblock functions r1969 MBAFF: Add mbaff deblock strength calculation Move call to deblock_strength to x264_macroblock_deblock_strength to keep deblock strength calculation in one place. r1968 MBAFF: Update x264_cabac_mvd_sum_mmxext to work with larger MVDs. Author: Loren Merritt <pengvado@akuvian.org> r1967 MBAFF: Clamp MVDs to 66 instead of 33 r1966 MBAFF: CABAC encoding of skips r1965 MBAFF: Track what interlace decision the decoder is using r1964 MBAFF: Fix mvy bounds Fix MV clipping r1963 MBAFF: Copy deblocked pixels to other plane r1962 MBAFF: Disallow skip where predicted interlace flag would be wrong r1961 MBAFF: Inter support r1960 MBAFF: Neighbour calculation Back up intra borders correctly and make neighbour calculation several times longer. r1959 MBAFF: Store references to the two left macroblocks r1958 MBAFF: Store left references in a table r1957 MBAFF: Disable adaptive MBAFF when subme 0 is used r1956 MBAFF: Save interlace decision for all macroblocks r1955 Fix bug in NAL buffer resizing Also properly terminate if NAL buffer resizing fails. r1954 Fix zone bitrate multiplier and QP forcing in 2-pass mode Previously zone changes could affect frames outside of the given frame range (around 20 neighboring frames). r1953 Use float constants in qp rounding Slight performance improvement and fixes slight difference in output between gcc 3.4 and 4.5. r1952 Fix bugs with ratecontrol reconfiguration Initialization of some parameters was missed or wasn't synchronized with other threads r1951 More validation of input parameters This fixes a crash with --me umh and insane values of --me-range. r1950 Fix bug in --b-adapt 2 with --rc-lookahead >248 Problem caused by buffer overflow in strcpy. r1949 Check for invalid pixfmts in lavf demuxer r1948 in r1944 roke sliced-threads + slice-max-size/slice-max-mbs. r1947 Precalculate CABAC initialization contexts Slightly faster encoding with lots of slices. r1946 Avoid redundant log2f calls in mv cost initialization Saves around 100 million clock cycles on x264 init. r1945 CABAC residual: cleanup and optimizations Also kill all Hungarian notation while we're at it. Trim an instruction off cabac_encode_bypass. r1944 Validate input parameters more carefully Get rid of redundant warnings upon encoder_reconfig calls. Also avoid encoder_reconfig turning off psy_rd/trellis. r1943 Fix VFR MB-tree to work as intended Should improve quality with FPSs much larger or smaller than 25. r1942 Support more recent GPAC versions r1941 Fix decoder desync with positive --chroma-qp-offset and zones r1940 Use AVMEDIA_TYPE_VIDEO instead of deprecated CODEC_TYPE_VIDEO Fixes build with lavf/lavc 53. r1939 Force pic-struct for Blu-ray compat + fake-interlaced r1938 Fix open-gop with no-psy r1937 Fix build with disabled asm r1936 Improve Blu-ray compliance Use dec_ref_pic_marking SEIs to repeat B-ref referencing information. Don't allow B-frames to reference frames outside their minigop. r1935 Consolidate Blu-ray hacks into --bluray-compat This option is now required for Blu-ray compatibility. --open-gop bluray is now gone (using bluray-compat and open-gop implies a Blu-ray compatible open-gop). This option doesn't automatically enforce every aspect of Blu-ray compatibility (e.g. resolution, framerate, level, etc). r1934 Add SSE support to rectangle.h for 16-byte stores Uses GCC vector intrinsics; may be suboptimal on particularly old GCC versions. r1933 Do not force Intel Compiler to target pre-mmx architecture for x86 Caused a speed penalty against gcc equivalents. r1932 Warn users when using --(psnr|ssim) without --tune (psnr|ssim) This is a counter to the proliferation of incredibly stupid psnr/ssim "benchmarks" of x264 in which the benchmarker conveniently "forgot" --tune psnr/ssim, crippling x264 in the test. r1931 Remove redundant mbcmp calls in weightp analysis r1930 Use integer math for filler size calculation r1929 Disable progress for FFMS input with --no-progress r1928 Fix bug in intra-refresh ratecontrol Row SATDs were slightly incorrect. r1927 Cosmetics: fix some signedness issues found by -Wsign-compare r1926 Minor fixes Fix a comment typo. Align an array properly. Make x264_scan8 unsigned: saves a bunch of movsxd instructions on x86_64. r1925 Improve C99 support checks in configure Fixes configuration with Intel compiler in some cases. r1924 Eliminate the possibility of CAVLC level code overflow Instead, if it happens, just re-encode the MB at higher QPs until it fits. r1923 x86 SIMD versions of optimize_chroma_dc SSE2/SSSE3/SSE4/AVX implementations. About 3x faster. r1922 Add Altivec version of mc_weight r1921 Add Altivec versions of mbcmp_x functions These aren't merged versions, they just call the existing asm code. A merged implementation would of course be faster. r1920 Recognize cygwin as itself when not targeting mingw Also fix broken thread detection on cygwin. r1919 Patch Intel's CPU dispatcher Reduces Intel Compiler's bias against non-Intel CPUs. Big thanks to Agner for the original information on how to do this. r1918 Intel Compiler support Big thanks to David Rudie, the original author of this patch. r1917 Cosmetics: make struct definition braces consistent r1916 Fix restoring of console title on Windows with ffms indexing r1915 Fix possible buffer overflow in mp4 muxer r1914 Remove inline asm syntax not supported by LLVM's assembler Doesn't affect compiled output outside of LLVM. r1913 Fix 10L in r1912 SSSE3 code got used in MMX/SSE2 and vice versa (in hpel). r1912 Add AVX functions where 3+ arg commands are useful r1911 Frame-packing 3D: don't place scenecuts on right views Caused problems for some players. r1910 Improve slice-max-size handling of escape bytes More accurate but a bit slower. Helps deal with a few obnoxious corner cases where the current algorithm failed. r1909 Use bs_write1 wherever possible in header writing r1908 Remove obsolete mvcost init code r1907 Fix memory leak on encoder close if not all frames are flushed r1906 Fix signedness bug in CPU detection Luckily didn't affect anything due to C signedness rules. r1905 Fix dumb bug caused by stray semicolon Caused noise reduction to run incorrectly in part of RD, but probably had no effect. r1904 Fix malloc of zero size Caused x264 to fail with some settings on systems that return a NULL pointer for malloc(0), like Solaris. r1903 Fix crash in mp4 muxer after failure of x264_encoder_open r1902 Fix shadowed variable warning in ffms.c r1901 Fix some Intel compiler warnings r1900 Fix 10L in r1886 Aspect ratio can't be set before SPS is initted. r1899 Improve update interval of x264cli progress information Now updates every 0.25s instead of every N frames. r1898 Windows: restore previous console title after encoding MSDN docs claim that SetConsoleTitle's effect is reverted when the process terminates, but this doesn't always work properly. Accordingly, manually revert the console title at the end of encoding. r1897 Allow WEIGHTP_FAKE in interlaced mode It seems to work fine as-is even though real weightp doesn't support interlacing yet. r1896 Output pic struct information in libx264 API r1895 Enable FastShuffle on Penryn and Nehalem CPUs without SSE4 r1884 Hotfix for some bugs in VBV emergency r1883 Fix warnings in cpu.c r1882 Check for OS AVX support in addition to CPUID Even if not using ymm registers, AVX operations will cause SIGILLs on unsupported OSs. On Windows, AVX is only available on Windows 7 SP1 or later. r1881 VBV emergency mode Allow ratecontrol to select "quantizers" above the maximum. These "quantizers" progressively decimate the source to avoid VBV underflow. x264 is now VBV compliant even with input as evil as /dev/random. r1880 Initial AVX support Automatically handle 3-operand instructions and abstraction between SSE and AVX. Implement one function with this (denoise_dct) as an initial test. x264 can't make much use of the 256-bit support of AVX (as it's float-only), but 3-operand could give some small benefits. r1879 Double the base framerate for frame-sequential 3D files A 60fps frame-sequential 3D file is really only 30 FPS, just alternating between eyes. Accordingly, ratecontrol should treat it as if it was really 30 FPS. This will increase the bitrate at the same CRF level for such videos when --frame-packing 5 is used. r1878 Add --input-fmt option to lavf input Conforms to ffmpeg's `-f` option. Use this when lavf fails to guess the input format. r1877 Two improvements to regression test script Use SHA-1 hashes for temporary file names to avoid exceeding OS filename length limits. Correctly return to the original branch after testing if you were on a branch. r1876 Add some missing values to the non-extended SAR table r1875 Bump dates to 2011 r1874 More correctly write frame-packing SEI flags Bug reported by Nero. r1873 Don't die in x264_encoder_close if an error occurred in x264_encoder_encode Also clean up properly in x264.c (mostly useful for finding bugs in cleanup). r1872 Fix reconfiguration of b_tff Attempting to change field order during encoding could cause slight corruption. Also fix delta_poc_bottom to be correctly set if interlaced mode is used without B-frames. r1871 Fix x264 CPU detection with >=64 CPUs on Windows x264 won't actually use more than one processor group's worth of CPUs, however. This isn't a problem, as a single x264 instance can't effectively use a full 64 cores anyways. r1870 Remove high bit depth mmx quant It was using pmuludq which is sse2, and the function isn't really possible without pmuludq. r1869 Fix cacheline check in avg2 w20 cache32 Didn't result in incorrect output, only slightly decreased speed on a few obsolete systems. r1868 instruction in high bit depth ssd_nv12_mmxext r1867 VFR/framerate-aware ratecontrol, part 2 MB-tree and qcomp complexity estimation now consider the duration of a frame in their calculations. This is very important for visual optimizations, as frames that last longer are inherently more important quality-wise. Improves VFR-aware PSNR as much as 1-2db on extreme test cases, ~0.5db on more ordinary VFR clips (e.g. deduped anime episodes). WARNING: This change redefines x264's internal quality measurement. x264 will now scale its quality based on the framerate of the video due to the aforementioned frame duration logic. That is, --crf X will give lower quality per frame for a 60fps video than for a 30fps one. This will make --crf closer to constant perceptual quality than previously. The "center" for this change is 25fps: that is, videos lower than 25fps will go up in quality at the same CRF and videos above will go down. This choice is completely arbitrary. Note that to take full advantage of this, x264 must encode your video at the correct framerate, with the correct timestamps. r1866 Improve reference ordering in interleaved 3D video Provides a decent compression improvement when encoding interleaved 3D content (--frame-packing 5). Helps more without B-frames and at lower bitrates. Note that x264 will not do this optimization unless --frame-packing 5 is used to tell x264 that the source is interleaved 3D. Tests consistently show that interleaved frame packing is by far the best way to compress 3D content. It gives a ~35-50% compression benefit over separate streams or top/bottom or left/right coding. Also finally add support for L1 reference reordering (in B-frames). Also add support for reordered ref0 in L0 and L1 lists; could be useful in the future for other things. r1865 Cosmetics: fref0/1 -> fref[2] and i_ref0/1 -> i_ref[2] A much-needed refactoring, plus makes the next patch easier. r1864 Check an extra offset during weightp analysis Up to 0.1 - 0.6 dB gain on some fade-ins with --weightp 1, less with --weightp 2. r1863 SSE2 high bit depth SSIM functions Patch from Google Code-In. r1862 SSE2 high bit depth intra_predict_(8x8c|16x16)_p Patch from Google Code-In. r1861 MMX high bit depth coeff_last4 Patch from Google Code-In. r1860 SSE2 high bit depth zigzag_interleave_cavlc Patch from Google Code-In. r1859 MMX/SSE2/SSSE3 high bit depth frame_init_lowres functions Patch from Google Code-In. r1858 MMX high bit depth 4x4 intra predict functions DDR and HD directions, as well as making HU faster. Also enable some SSE2 versions of high bit depth functions that were added but not properly enabled. Patch from Google Code-In. r1857 SSE2 high bit depth 8x8 intra predict functions DDL, DDR, VR, HU, and HD directions, as well as the 8x8 filter. Also make 8-bit MMX VR faster, by backporting the optimizations from the high bit depth version. Patch from Google Code-In. r1856 MMX/SSE2 high bit depth 8x8c intra predict functions Patch from Google Code-In. r1855 MMX version of high bit depth plane_copy And various cosmetics. Patch from Google Code-In r1854 Faster x86 predict_8x8c_dc, MMX/SSE2 high bit depth versions r1853 SSSE3 high bit depth sad_aligned functions r1852 MMX/SSE2 high bit depth interleave functions Patch from Google Code-In. r1851 MMX/SSE2 high bit depth avg functions Patch from Google Code-In. r1850 MMX/SSE2 high bit depth deinterleave functions Patch from Google Code-In r1849 Shut up some incorrect gcc uninitialized variable warnings r1848 Write --crop-rect and --frame-packing options to x264 SEI r1847 Add missing space to parameter SEI r1846 Fix typo in documentation r1845 Fix redundant linebreaks in statsfile with weightp r1844 Use cross_prefix for strings in endian test and as test r1843 Fix checkasm test for quant in high bit depth Eliminate some spurious failures. r1842 Fix broken YV12 handling in the resize filter r1841 Fix bug with negative lookahead mb costs in high bit depth r1840 Fix overflow in SSIM calculation in 10-bit r1839 Fix some possible overflows in VFR ratecontrol with extreme timebases r1838 Fix memory leak in lavf demuxer. Leak only occurred with input files that have more than one video stream. r1837 Fix satd predictors with high bit depth Resulted in odd CRF-mode results with --no-mbtree, as well as suboptimal VBV handling. r1836 Fix compile error with high bit depth and disable-asm r1835 Really fix gcc win32 misalignment crash gcc's -fno-zero-initialized-in-bss only works if an explicit initializer (e.g. = {0}) is used. r1834 Support for native Windows threads Patch originally by Pegasys Inc. r1833 MMX/SSE2 high bit depth weight_cache/offset(sub|add) functions Patch from Google Code-In. r1832 SSE2 high bit depth dequant functions Patch from Google Code-In. r1831 SSE2 high bit depth zigzag functions Patch from Google Code-In. r1830 MMX/SSE2 versions of high bit depth store_interleave Patch from Google Code-In. r1829 Add frame-packing SEI support for signalling 3D video r1828 Allow 8x8dct+cavlc+lossless with subme>=6 r1827 Add interlaced/no-interlaced case to regression test script r1826 Save more memory with weightp in >8-bit r1825 .gitignore more untracked file types r1824 Work around gcc/ld alignment bug on win32 Fixes problems due to misalignment of static zero arrays (win32 ld can't align .bss properly). r1823 Fix high bit depth intra pred functions And re-enable them accordingly. Patch from Google Code-In. r1822 Fix weightp analysis with high bit depth r1821 Fix build error in high depth Caused by multiple definitions of x264_add8x8_idct_sse2. r1820 Hotfix for high bit depth Temporary fix for some unaligned access crashes. r1819 Delete x264_config.h on distclean r1818 Tons of high bit depth intra predict asm Patch from Google Code-In. r1817 SSE2 high bit depth 8x8/16x16 idct/idct_dc Patch from Google Code-In. r1816 Create and install x264_config.h This header can be used to determine the bit-depth and license of libx264. r1815 Detect Avisynth initialization failures Detect if there is a critical Avisynth initialization failure and print the associated error. This, however, requires a feature present in the latest version of Avisynth alpha (2.6). Previous versions are unaffected. r1814 Automatically restrict QPs to avoid quantization (under|over)flow --cqm jvt and similar should now work "out of the box" instead of requiring futzing with --qpmin. r1813 Don't try to get timecodes if reading frame failed This fixes "input timecode file missing data for frame" warning with piped input where we don't know total number of frames. r1812 Fix possible overflow in sub4x4_dct in 10-bit builds r1811 Fix bug in intra-refresh + threads Intra refresh bar quality increase wasn't correctly applied. r1810 Fix file handle leak in libx264 on error r1809 Fix incompatible csp format issue Problem occurred with unknown pixel formats and non mod2 resolutions in the resize filter. r1808 Really fix fittobox resize rounding code r1807 Fix regression in rev1549 Skip auto timebase denominator generation when generated timebase denominator exceeds UINT32_MAX. Also fix double free. r1806 Fix --tcfile-in if timecode v2 file starts from nonzero pts r1805 SPARC/Solaris build fixes r1804 Fix typo in r1797 r1803 Add Python regression test script Patch from Google Code-In. r1802 Make --weightp 1 a better speed tradeoff Since fade analysis is now so fast, weightp 1 now does fade analysis but no reference duplication. This is the opposite of what it used to do (reference duplication but no fade analysis). This also gives weightp's better fade quality to faster presets (up to superfast). r1801 SSE versions of some high-bit-depth DCT functions Our first Google Code-In patch! r1800 Clean up weightp analysis function r1799 Add API function to return max number of delayed frames r1798 Copy field order flag in encoder_reconfig r1797 Cosmetics in configure r1796 Add some more info to `x264 --version` r1795 Change qpmin default to 0 There's probably no real reason to keep it at 10 anymore, and lowering it allows AQ to pick lower quantizers in really flat areas. Might help on gradients at high quality levels. The previous value of 10 was arbitrary anyways. r1794 Fix ticks_per_frame check for VFR input r1793 Fix configure so that boolean configuration options are 1/0 There are many cases of 1/undef, not 1/0. r1792 Only build SPARC VIS asm if high bit-depth is disabled r1791 Fix build on SPARC Solaris 10 r1790 Fix resize filter rounding code r1789 Fix regression in chroma weightp Missing cache calls could cause artifacts, encoder/decoder desync. r1788 Fix some crashes with high bit depth Not all arrays were sufficiently aligned. r1787 Chroma weighted prediction Like luma weighted prediction, dramatically improves compression in fades. Up to 4-8db chroma PSNR gain in extreme cases (short, perfect fade-outs). On actual videos, helps up to ~1% overall. One example video with a decent number of fades (ef OP): 0.8% bitrate reduction overall, 7% bitrate reduction just counting chroma. Fixes a lot of artifacts in fades at lower bitrates. Original patch by Dylan Yudaken <dyudaken@gmail.com>. r1786 Support custom cropping rectangles Supposedly useful for 3D television applications. r1785 Convert X264_HIGH_BIT_DEPTH to HIGH_BIT_DEPTH Less verbose. r1784 x86 asm for high-bit-depth pixel metrics Overall speed change from these 6 asm patches: ~4.4x. But there's still tons more asm to do -- patches welcome! Breakdown from this patch: ~13x faster SAD than C. ~11.5x faster SATD than C (only MMX done). ~18.5x faster SA8D than C. ~19.2x faster hadamard_ac than C. ~8.3x faster SSD than C. ~12.4x faster VAR than C. ~3-4.2x faster intra SAD than C. ~7.9x faster intra SATD than C. r1783 x86 asm for some high-bit-depth coefficient functions ~7.9x faster denoise than C. ~2.3x faster coeff_level_run than C. ~6.6x faster coeff_last than C. ~4.3x faster decimate_score than C. Also improve checkasm's decimate_score test. r1782 x86 asm for high-bit-depth motion compensation ~8x faster qpel MC than C. ~10x faster hpel than C. r1781 x86 asm for high-bit-depth quant ~3.1-4.2x faster than C. r1780 x86 asm for high-bit-depth DCT Only MMX and DCT done so far; iDCT still needs asm as well. ~4.4x faster than C. r1779 x86 asm for high-bit-depth deblocking ~3.3x faster than C. r1778 Use a 16-bit buffer in hpel_filter regardless of bit depth This only works up to and including 10-bit (but we don't support anything higher yet). r1777 Use enums instead of magic numbers in x264_mb_partition_pixel_table r1776 Improve configure script logging Now prints the test program that failed in addition to error messages. r1775 Fix constrained intra pred mode selection r1774 Various high-bit-depth ratecontrol fixes r1773 Fix a crash in --dump-yuv for odd resolutions r1772 Improve flash detection algorithm change in r1765 Now only disables scenecuts only near real end of video, not just prior to forced keyframes. r1771 Update ffms2 support for its latest API break. r1770 Modify the x264 header accordingly if --disable-gpl is used r1769 Save a bit of memory with weightp + high bit depth r1768 Fix bugs in qpfile parsing with omitted QPs r1767 Fix HRD with intra-refresh x264 was incorrectly calculating cpb_removal_delay with respect to the first keyframe. It should have been calculating cpb_removal_delay with respect to the last keyframe. r1766 Fix bug in r1753 Overflow compensation fix broke CRF with --no-mbtree. r1765 Improve flash detection's behavior near the end of the video Flash detection catches situations like AAAABBCCDDDD, where A,B,C,D are frames in different scenes. x264 would place a keyframe on the first "D". However, if the video ended on the last "C", x264 would place a keyframe on the first "C", even though C classifies as a flash. This change fixes this issue. r1764 Improve quantizer handling The default value for i_qpplus1 in x264_picture_t is now X264_QP_AUTO.This is currently 0, but may change in the future. qpfiles no longer use -1 to indicate "auto"; QP is just omitted.The old method should still work though. CRF values now make sense in high bit depth mode. --qp should be used for lossless mode, not --crf. --crf 0 will still work as expected in 8-bit mode, but won't be lossless with higher bit depths. Add bit depth to statsfiles. These changes are required to make the QP interface sensible in combination with high bit depth. r1763 VFR-aware PSNR/SSIM measurement First step to VFR-aware MB-tree and bit allocation. r1762 Disable weightp offset=-1 dupes with high bit depth They're a hack to compensate for crappy rounding, and thus not worth doing at high bit depth, which fixes most of the rounding issues. r1761 Make the ffmpeg -vpre error message more descriptive r1760 Add numeric names for the presets (0==ultrafast ... 9==placebo) This mapping will of course change if new presets are added in between, but will always be ordered from fastest to slowest. r1759 Update benchmarks in doc/threads.txt r1758 Make the #if'd out naive ESA actually match the real implementation r1757 Move mv/ref prefetch code to the correct location Prefetching of top blocks should be done under if(top), not if(left). r1756 Link x264cli explicitly against lavf Fixes some problems with crappy linkers. r1755 Fix CBR ratecontrol bug with extremely high qscales Caused CBR ratecontrol to take a very long time to recover from extreme situations (e.g. /dev/urandom). r1754 Disable overflow compensation in CRF mode Wasn't designed with CRF in mind, and acts really weird with CRF+VBV. r1753 Fix stupid bug in B-frame VBV size prediction r1752 Fix regression in checkasm in r1666 Buffer is uint16_t* regardless of whether x264 was compiled with high bit depth or not. r1751 Fix overflows in satd, sa8d and hadamard_ac with high bit depth r1750 Fix potential problem with overflows in ssd_nv12 The risk of overflows increases exponentially with the bit depth. The 8-bit asm versions may still overflow with image widths >= 11008 (or 6604 if interlaced). r1749 Fix syntax for some parameterless functions Technically, such functions should be declared with (void), not (). r1748 Fix fps reporting on mingw64 _ftime on mingw64 uses __timeb32 which is broken. Use ftime instead. r1747 Fix compilation on PPC with some recent GCCs r1746 Fix Altivec SATD with small strides Fixes chroma ME and some of lookahead on PPC. r1745 Address remaining cacheline split issues in avg2 Slightly improved performance on core 2. Also fix profiling misattribution of w8/16/20 mmxext cacheline loops. r1744 Trim a few bytes off some x86 intra pred functions r1743 Move DTS compression from libx264 to x264cli DTS compression is an ugly stupid hack and starting to encroach on unrelated areas like VBV. Some people want it in the mp4 muxer for devices and/or splitters that don't support Edit Boxes. We just say "throw these broken devices out the window". DTS compression will remain as a muxer option, --dts-compress, at the user's own risk. This option is disabled by default. r1742 Use a larger pic_init_qp with high bit depth Modify pic_init_qs for consistency. r1741 Update some of the information in doc/ r1740 Update header in depth.c r1739 Remove some old unused stuff in the build tree Regression test (hasn't been updated since svn). Doxy (was never used). r1738 Various cosmetics Exorcise some CamelCase. r1737 Add missing mod4 stack check to sse2_misalign mc_chroma Required for ICC compilation. r1736 Fix 2pass ratecontrol with --nal-hrd cbr r1735 Fix minor bug in intra pred with intra refresh i8x8 blocks didn't properly avoid predicting from top-right when necessary. This could cause intra refresh to not completely refresh the frame. r1734 Fix filter parsing with --extra-cflags="-DNDEBUG" r1733 Make sigint handler variable volatile Didn't actually cause any problems, but is necessary because it can be modified by another thread (the signal call). r1732 Add High 10 Intra profile support (AVC-Intra) x264 should now be able to encode compliant AVC-Intra 50. With a 10-bit-compiled version of x264, a sample commandline for 1080i25 might be: --interlaced --keyint 1 --vbv-bufsize 2000 --bitrate 50000 --vbv-maxrate 50000 --nal-hrd cbr Also print "Constrained Baseline" for baseline profile, since that's all x264 (and everything else in the world) supports. Also reorganize parameter validation a bit to reduce some spurious warnings. r1731 Finish support for high-depth video throughout x264 Add support for high depth input in libx264. Add support for 16-bit colorspaces in the filtering system. Add support for input bit depths in the interval [9,16] with the raw demuxer. Add a depth filter to dither input to x264. r1730 Chroma mode decision/subpel for B-frames Improves compression ~0.4-1%. Helps more on videos with lots of chroma detail. Enabled at subme 9 (preset slower) and higher. r1729 Various cosmetics r1728 Make slice-max-size more aggressive in considering escape bytes The x264 assumption of randomly distributed escape bytes fails in the case of CABAC + an enormous number of identical macroblocks. This patch attempts to compensate for this. It is probably safe to assume in calling applications that x264 practically never violates the slice size limitation. r1727 Add missing emms for dump-yuv r1726 Fix CFR ratecontrol with timebase != 1/fps Fixes VBV + DTS compression, among other things. r1725 Fix DTS/bitrate calculation if the first PTS wasn't zero Fix bitrate calculation with DTS compression. r1724 Fix regression in r1716 r1723 Cosmetics in me.c and frame.c r1722 Add support for arbitrary user SEIs This allows calling applications to insert SEIs that x264 doesn't know about while maintaining HRD/VBV accuracy. r1721 Add full chroma input flag to swscale Improves quality of colorspace conversions involving RGB(A). r1720 Add --disable-gpl option to configure Used for commercially-licensed versions of x264. Doesn't currently change anything, but may be used to disable GPL-only CLI tools, such as video filters, in the future. Also print the x264 license and libavformat license in version info. r1719 Update source file headers Update dates, improve file descriptions, make things more consistent. Also add information about commercial licensing. r1718 Fix intra refresh to not exceed max recovery_frame_cnt The spec constrains recovery_frame_cnt to [0, MaxFrameNum-1]. So make MaxFrameNum bigger in the case of intra refresh. r1717 Make intra refresh finish one frame faster In some cases, the last frame of intra refresh was redundant. Saves a few bits. r1716 Fix intra refresh to not predict from invalid pixels The blocks on the right side of the intra refresh column should not predict from top-right. r1715 Add configure check for mingw64 prefixing This compensates for the inconsistent prefixing seen in different versions of the compiler. r1714 Update some Altivec function prototypes Silences a lot of warnings. r1713 Add support for level 1b This level is a stupid hack in the H.264 spec, so it's a stupid hack in x264 too. Since level is an integer, calling applications need to set level_idc=9 to use it. String-based option handling will accept "1b" just fine though, so CLI users don't have to worry. r1712 Use smaller values for idr_pic_id Saves a few bits and fixes problems on certain fantastically terrible decoders, such as the Apple iPad. r1711 Use POC type 2 for streams with no B-frames Saves a few bits per slice header. r1710 Faster cabac_encode_ue_bypass Use CLZ + a lut instead of a loop. r1709 Faster nal_escape asm r1708 Allow --demuxer forcing with known extensions r1707 Minor fixes/cosmeticcs in commandling parsing r1706 Fix overflow in stats printing r1705 Fix bug in 2pass if the first P-frames are all skip last_qscale_for was read before being initialized in this case, resulting in the value from the previous iteration being used instead. r1704 Don't do deblock-aware RD if deblocking is off r1703 CAVLC "trellis" ~3-10% improved compression with CAVLC. --trellis is now a valid option with CAVLC. Perhaps more importantly, this means psy-trellis now works with CAVLC. This isn't a real trellis; it's actually just a simplified QNS. But it takes enough shortcuts that it's still roughly as fast as a trellis; just not quite optimal. Thus the name is a bit of a misnomer, but we're reusing the option name because it does the same thing. A real trellis would be better, but CAVLC is much harder to trellis than CABAC. I'm not aware of any published polynomial-time solutions that are significantly close to optimal. r1702 Add global #define for maximum reference count This should make it easier to play around with reference frame counts that exceed the spec maximum. r1701 Simplify addressing logic for interlaced-related arrays In progressive mode, just make [0] and [1] point to the same place. r1700 Add missing emms to x264_nal_encode Only matters for applications using the low-latency callback feature. r1699 Fix 2 bugs with slice-max-size Macroblock re-encoding didn't restore mv/tex bit counters (slightly inaccurate 2-pass). Bitstream buffer check didn't work correctly (insanely large frames could break encoding). r1698 NV12 version of Altivec chroma MC r1697 Deblock-aware RD Small quality gain (~0.5%) at lower bitrates, potentially larger with QPRD. May help more with psy, maybe not. Enabled at subme >= 9.Small speed cost (a few %). r1696 Correct X header path usage in configure Don't unconditionally set the header path for OpenBSD but do so if the --enable-visualize flag is specified. r1695 Fix lavf input with delayed frames r1694 Slightly improve the filtering section of x264 --help r1693 Fix debug message typo with DTS compression r1692 Try to guess input length for lavf input Allows printing of progress indicator when using lavf input. r1691 Workaround bug in fps/timestamp handling with lavf input reordered_opaque in lavf doesn't work correctly in the identity case (no reordering). Fixes incorrect output for some file types (e.g. raw in mov). r1690 Fix aspect ratio writing in the MKV muxer The braindead Matroska spec dictates aspect ratio to be measured in pixels instead of, well, an actual aspect ratio. r1689 Add libavcore check in configure r1688 Improve quantizer distribution with sliced-threads+VBV Should help avoid cases of very uneven quantizer choice between slices. r1687 Remove dead code in slicetype.c r1686 Fix incorrect duration/framerate/bitrate in flv header r1685 invalidate_reference fixes invalidate_reference didn't actually invalidate the immediate previous frame, only frames that came before that. Make sure that reordering is forced when invalidate_reference is used, so that the reference list is correct decoder-side. r1684 Filtering system-related fixes Fix configure to check for outdated libavutil in resize filter support. Do not print an explicit error message in ffms when requesting a frame beyond the number of frames in the source. Mention in --*help that filtering options can be specified as name=value. Fix the shadowing warning in the resize filter on posix systems. r1683 Improve reference_invalid support Reference invalidation can now be used to invalidate multiple frames at a time, rather than being limited to one per encoder_encode call. r1682 Eradicate all mention of SI/SP-frames r1681 Fix stack alignment with MB-tree Broke 2-pass with MB-tree when calling from compilers with broken stack alignment (e.g. MSVC). r1680 Avisynth 2.6 colorspace support Use a customized avisynth_c.h to detect the new planar colorspaces. r1679 Prevent some cases of cache aliasing. Avoid cases where image strides were a large power of 2. Core 2: +3% speed at widths 898..960, +6% at widths 1922..1984, most other resolutions unaffected. Nehalem and AMD: similar amount of speedup, but fewer resolutions affected. r1678 Fix stack alignment for adaptive quant Broke calls from compilers with broken stack alignment (e.g. MSVC). r1677 Fix compilation with shared ffmpeg libs lavf input uses libavutil functions, so it must request flags for libavutil from pkg-config. r1676 Fix another PCM bug CABAC assumes that NNZ is 0 or 1, not the number of actual nonzero coefficients. Didn't actually break the output; only had a tiny effect on RD. r1675 Fix regression in r1666 Broke encoding of PCM macroblocks. r1674 Fix build with bit_depth > 8 Definition of x264_cli_plane_copy was inconsistent with declaration. r1673 Convert x264 to use NV12 pixel format internally ~1% faster overall on Conroe, mostly due to improved cache locality. Also allows improved SIMD on some chroma functions (e.g. deblock). This change also extends the API to allow direct NV12 input, which should be a bit faster than YV12. This isn't currently used in the x264cli, as swscale does not have fast NV12 conversion routines, but it might be useful for other applications. Note this patch disables the chroma SIMD code for PPC and ARM until new versions are written. r1672 Add video filtering system to x264cli Similar to mplayer's -vf system. Supports some basic operations like resizing and cropping.Will support more in the future. See the help for more details. r1671 Eliminate edge cases for MV predictors Saves a few clocks in mv pred. r1670 Improve scenecut detection a bit Put a minimum value on the scenecut threshold; makes x264 more likely to catch successive scenecuts (but might increase the odds of false detection). This also fixes scenecut detection with keyint=infinite. Also print keyint=infinite in the x264 SEI and statsfile correctly. r1669 Fix 8x8dct+slices+no sliced threads+cavlc+deblock Deblocking was done slightly incorrectly. Regression in r1612. r1668 Fix off-by-one error in slice VBV predictor updates r1667 Fix disabling of progress with --log-level r1666 Support for 9 and 10-bit encoding Output bit depth is specified on compilation time via --bit-depth. There is currently almost no assembly code available for high-bit-depth modes, so encoding will be very slow. Input is still 8-bit only; this will change in the future. Note that very few H.264 decoders support >8 bit depth currently. Also note that the quantizer scale differs for higher bit depth.For example, for 10-bit, the quantizer (and crf) ranges from 0 to 63 instead of 0 to 51. r1665 Support infinite keyint (--keyint infinite). This just means x264 won't insert non-scenecut keyframes. Useful for streaming when using interactive error recovery or some other mechanism that makes keyframes unnecessary. Also change POC logic to limit POC/framenum LSB size (to save bits per slice). Also fix a bug in the CPB underflow detection code (didn't affect the bitstream, just resulted in the failure to print certain warning messages). r1664 Don't check i16x16 planar mode unless previous modes were useful Saves ~160 clocks per MB at subme=1, ~270 per MB at subme>1 (measured on Core i7). Negligle effect on compression. Also make a few more arrays static. r1663 Centralize logging within x264cli x264cli messages will now respect the log level they pertain to. Slightly reduces binary size. r1662 Make open-GOP Blu-ray compatible Blu-ray is even more braindamaged than we thought. Accordingly, open-gop options are now "normal" and "bluray", as opposed to display and coded. Normal should be used in all cases besides Blu-ray authoring. r1661 Callback feature for low-latency per-slice output Add a callback to allow the calling application to send slices immediately after being encoded. Also add some extra information to the x264_nal_t structure to help inform such a calling application how the NAL units should be ordered. Full documentation is in x264.h. r1660 Simplify pixel_ads r1659 Interactive encoder control: error resilience In low-latency streaming with few clients, it is often feasible to modify encoder behavior in some fashion based on feedback from clients. One possible application of this is error resilience: if a packet is lost, mark the associated frame (and any referenced from it) as lost. This allows quick recovery from errors with minimal expense bit-wise. The new i_dpb_size parameter allows a calling application to tell x264 to use a larger DPB size than required by the number of reference frames. This lets x264 and the client keep a large buffer of old references to fall back to in case of lost frames. If no recovery is possible even with the available buffer, x264 will force a keyframe. This initial version does not support B-frames or intra refresh. Recommended usage is to set keyint to a very large value, so that keyframes do not occur except as necessary for extreme error recovery. Full documentation is in x264.h. Move DTS/PTS calculation to before encoding each frame instead of after. Improve documentation of x264_encoder_intra_refresh. r1658 Lookaheadless MB-tree support Uses past motion information instead of future data from the lookahead. Not as accurate, but better than nothing in zero-latency compression when a lookahead isn't available. Currently resets on keyframes, so only available if intra-refresh is set, to avoid pops on non-scenecut keyframes. Not on by default with any preset/tune combination; must be enabled explicitly if --tune zerolatency is used. Also slightly modify encoding presets: disable rc-lookahead in the fastest presets. Enable MB-tree in "veryfast", albeit with a very short lookahead. r1657 Open-GOP support Allows B-frames immediately prior to keyframes (in display order). This helps reduce keyframe popping and improve compression with short keyframe intervals. Due to a staggering display of braindamage in the Blu-ray spec, two open-GOP modes are available. The two modes calculate keyframe interval differently: one based on coded distance and one based on display distance. The latter is superior compression-wise, but for no comprehensible reason, Blu-ray requires the former if open-GOP is used. r1656 Use threadpools to avoid unnecessary thread creation Tiny performance improvement with fast settings and lots of threads. May help more on some OSs with slow thread creation, like OS X. Unify inconsistent synchronized abbreviations to sync. r1655 Improve 2-pass bitrate prediction Adapt based on distance to the end in bits, not in frames. Helps in videos with absurdly simple end sections, e.g. black frames. r1654 SSE4 and SSSE3 versions of some intra_sad functions Primarily Nehalem-optimized. r1653 Improve HRD accuracy In a staggering display of brain damage, the spec requires all HRD math to be done in infinite precision despite the output being of quite limited precision. Accordingly, convert buffer management to work in units of timescale. These accumulating rounding errors probably didn't cause any real problems, but might in theory cause issues in very picky muxers on extremely long-running streams. r1652 Use -fno-tree-vectorize to avoid miscompilation Some versions of gcc have been reported to attempt (and fail) to vectorize a loop in plane_expand_border. This results in a segfault, so to limit the possible effects of gcc's utter incompetence, we're turning off vectorization entirely. It's not like it ever did anything useful to begin with. r1651 Fix SIGPIPEs caused by is_regular_file checks Check to see if input file is a pipe without opening it. r1650 Fix compilation on ARM w/ Apple ABI r1649 Faster mbtree_propagate asm Replace fp division by multiply with the reciprocal. Only ~12% faster on penryn, but over 80% faster on amd k8. Also make checkasm slightly more tolerant to rounding error. r1648 Convert the OPT_ defines in x264.c to an enum r1647 Don't allow baseline profile streams with fake-interlaced Indicate use of --fake-interlaced in encoding options SEI. r1646 Allocate space for null terminator in param_apply_tune r1645 Fix regression in r1501. Could cause slightly incorrect analysis in rare cases, but no serious encoding issues. Also shut up gcc warning about pels_v. r1644 Fix crash with --subme 0 + --weightp > 0. Regression in r1535 r1643 Replace some divisions with shifts r1642 Warn about shadowed variable declarations Also get rid of a few instances of variable shadowing. r1641 Template load_pic_pointers based on interlaced Significantly speeds up cache_load in the non-interlaced case. Also various other minor optimizations in cache_load and cache_save. r1640 Remove double-dereferences for MB width/height data Store it in x264_t instead of going through the SPS. r1639 Exempt Win x86_64 from memalign hack The API mandates all mallocs are 16 byte aligned. Remove unused int that stores sizeof malloc in memalign hack. r1638 Preprocessing cosmetics Unify input/output defines to HAVE_* format. Define values as 1 to simplify conditionals. r1637 Take more shortcuts in i4x4/i8x8 analysis Based on the scores of the H and V modes, rule out modes which are unlikely. Small compression loss (0.1-0.5%) and large speed gain (10-30% faster intra analysis). Not enabled in slower encoding modes. Also make C versions of the merged SATD functions in order to eliminate branches based on their availability. r1636 Display SSIM measurement in db as well r1635 indicate "M" for local commits too :Sun Jun 6 15:21:12 2010 +0800 Add error message for invalid [de]muxer selection r1633 Deduplicate the ALIGN macro, move it to common.h r1632 Fix a use of ALIGNED_ARRAY_16 on ARM r1631 Add missing emms after nal_encode Caused random, bizarre failures with some calling applications. r1630 Fix crash in fake-interlaced at some resolutions r1629 Fix no-mbtree + aq-mode=0 Regression in r1618. r1628 Add API function to fix x264_picture_t initialization Calling applications that do not use x264_picture_alloc need to use x264_picture_init to initialize x264_picture_t structures. Previously, if the calling application didn't zero x264_picture_t, Bad Things could happen. r1627 Fix Avisynth input Regression in r1624.A more permanent solution to the problem will be committed later. r1626 Convert to a unified "dctcoeff" type for DCT data Necessary for future high bit-depth support. r1625 Convert to a unified "pixel" type for pixel data Necessary for future high bit-depth support. Various macros and extra types have been introduced to make operations on variable-size pixels more convenient. r1624 Add API tool to apply arbitrary quantizer offsets The calling application can now pass a "map" of quantizer offsets to apply to each frame. An optional callback to free the map can also be included. This allows all kinds of flexible region-of-interest coding and similar. r1623 x86 assembly code for NAL escaping Up to ~10x faster than C depending on CPU. Helps the most at very high bitrates (e.g. lossless). Also make the C code faster and simpler. r1622 Re-enable i8x8 merged SATD Accidentally got disabled when intra_sad_x3 was added. r1621 Some deblocking-related optimizations r1620 Optimize out some x264_scan8 reads r1619 Add fast skip in lookahead motion search Helps speed very significantly on motionless blocks. r1618 Merge some of adaptive quant and weightp Eliminate redundant work; both of them were calculating variance of the frame. r1617 Fix omission in libx264 tuning documentation r1616 Fix ultrafast to actually turn off weightb r1615 Fix crash with MP4-muxing if zero frames were encoded r1614 Fix cavlc+deblock+8x8dct (regression in r1612) Add cavlc+8x8dct munging to new deblock system. May have caused minor visual artifacts. r1613 Fix 10L in r1612 Stats need to be calculated before deblock strength, not after. Broke ref stats in x264cli (no affect on actual output). r1612 Overhaul deblocking again Move deblock strength calculation to immediately after encoding to take advantage of the data that's already in cache. Keep the deblocking itself as per-row. r1611 Detect Atom CPU, enable appropriate asm functions I'm not going to actually optimize for this pile of garbage unless someone pays me. But it can't hurt to at least enable the correct functions based on benchmarks. Also save some cache on Intel CPUs that don't need the decimate LUT due to having fast bsr/bsf. r1610 Slightly faster mbtree asm r1609 Faster deblock strength asm on conroe/penryn r1608 Avoid an extra var2 in chroma encoding if possible Also remove a redundant if. r1607 Avoid a redundant qpel check in lookahead with subme <= 1. r1606 Fix ABR rate control calculations Incorrect frame numbers were used, resulting in slightly inaccurate ratecontrol. r1605 Fix calculation of total bitrate printed after stop by CTRL+C r1604 Fix typo in fake-interlaced documentation r1603 Fix CABAC+PCM, regression in r1592 Changes to queue in CABAC didn't get propagated to PCM code. r1602 Fix performance regression in r1582 Set the correct compiler flags. r1601 Rewrite deblock strength calculation, add asm Rewrite is significantly slower, but is necessary to make asm possible. Similar concept to ffmpeg's deblock strength asm. Roughly one order of magnitude faster than C. Overall, with the asm, saves ~100-300 clocks in deblocking per MB. r1600 Fix different output with differing sync-lookahead Also reduce memory consumption. r1599 Mark Win32 executable as large address aware r1598 Add "Fake interlaced" option This encodes all frames progressively yet flags the stream as interlaced. This makes it possible to encode valid 25p and 30p Blu-Ray streams. Also put the pulldown help section in a more appropriate place. r1597 Modify version.sh to output to stdout. Update configure to match. r1596 Set correct filesystem permissions for various files r1595 Fix regression in r1566 Intra stats need to be kept track of for fast intra decision. r1594 Fix rc-lookahead in encoding options SEI in 2-pass with VBV r1593 Reduce memory usage in 2-pass with b-adapt 2 r1592 Overhaul CABAC: faster, less cache usage Horribly munge up the CABAC tables to allow deduplication of some data. Saves 256 bytes of L1d cache in non-RD, 512 bytes in RD. Add asm versions of bypass and terminal; save L1i cache by re-using putbyte code. Further optimize encode_decision. All 3 primary CABAC functions fit in under 256 bytes of code total on x86_64. r1591 Fix typo in pulldown r1590 Fix bitrate calculation in progress status Was slightly incorrect due to using pts, which is out of order. r1589 Fix crash with sliced-threads on Phenom r1588 Fix condition for printing rc=cbr in options SEI Also fix crf-max formatting. r1587 Shrink even more constant arrays r1586 Add API function to trigger intra refresh Useful for interactive applications where the encoder knows that packet loss has occurred on the client. Full documentation is in x264.h. r1585 Fix intra refresh behavior with I-frames Intra refresh still allows I-frames (for scenecuts/etc). Now I-frames count as a full refresh, as opposed to instantly triggering a refresh. r1584 More cosmetics r1583 Fix unresolved symbol in r1573 gnu ld didn't complain, but some other linkers did. r1582 Remove unnecessary --enable options Change --enable-visualize to actually check for X11 support. r1581 Don't force row QPs to integer values with VBV VBV should no longer raise the bitrate of the video.That is, at a given quality level or average bitrate, turning on VBV should only lower the bitrate. This isn't quite true if adaptive quant is off, but nobody should be doing that anyways. Also may result in slightly more accurate per-row VBV ratecontrol. r1580 Add field-order detection to y4m demuxer r1579 Fix sliced-threads + interlaced Broken in r1546. r1578 Improve temporal MV prediction Predict based on the results of p16x16 search, not final MVs. This lets us get predictions even if mode decision chose intra. Also improves cache coherency. r1577 More accurate MV prediction on edges in lookahead r1576 Error out on invalid input stride Might catch some crashes due to buggy calling applications. r1575 Remove unnecessary debugging assert Shouldn't have been in r1568 to begin with. r1574 Shrink some more constant arraysr 1573 Deduplicate asm constants, automate name prefixing Auto-prefix global constants with x264_ in cextern. Eliminate x264_ prefix from asm files; automate it in cglobal. Deduplicate asm constants wherever possible to save data cache (move them to a new const-a.asm). Remove x264_emms() entirely on non-x86 (don't even call an empty function). Add cextern_naked for a non-prefixed cextern (used in checkasm). r1572 Shrink a few x86 asm functions Add a few more instructions to cut down on the use of the 4-byte addressing mode. r1571 Make options SEI use weight* instead of wpred* More intuitive and maps more reasonably to the CLI options. Breaks statsfile backwards-compatibility. r1570 r1548 broke subme < 3 + p8x8/b8x8 Caused significantly worse compression.Preset-wise, only affected veryfast. Fixed by not modifying mvc in-place. r1569 More write-combining r1568 Reduce lookahead memory usage, cache misses Merge lowres_types with lowres_costs. r1567 Fix build on x86 with asm on but SSE off r1566 Don't calculate ref/partition stats if not necessary r1565 Split out MV prediction into mvpred.c Make common/macroblock.c a bit less gigantic. r1564 Fix mv predictor clipping on non-x86 (regression in r1548) r1563 Move getopt.c to x264cli sources from libx264 Only affects builds on systems without getopt.c. r1562 Move deblocking code to a separate file Should clean up frame.c a bit. r1561 fix ffms demuxer to support input timebase values > 2^31 r1560 Fix 10l in cache_load changes Broke constrained intra pred, probably not anything else. r1559 Faster fullpel predictor checking Also shave a few instructions off dia/hex motion estimation loops. r1558 Fix checkasm's generation of deblock inputs (regression in r1517) r1557 Fix printing of bitrate when timestamps aren't available Doesn't affect x264cli, but was broken in some other apps in CFR mode. r1556 Don't check mv0 twice One less SAD in motion estimation. Also rename bmv -> pmv; more accurate naming. r1555 Remove reordering restrictions from weightp Apparently the spec does allow two consecutive copies of the same frame in the reference list. This involves an incredibly ugly hack to wrap around the frame number. Very slight compression improvement. r1554 Print intra chroma pred modes in stats r1553 Add mv0 special case in pskip chroma MC Significantly faster pskip MC. r1552 Fix build scripts to work with non-GNU tools r1551 Faster deblock reference frame checks Use a lookup table to simplify logic r1550 Faster chroma CBP handling r1549 Fix issues with extremely large timebases With timebase denominators >= 2^30 , x264 would silently overflow and cause odd issues. Now x264 will explicitly fail with timebase denominators >= 2^31 and work with timebase denominators 2^31 > x >= 2^30. r1548 MMX code for predictor rounding/clipping Faster predictor checking at subme < 3. r1547 Fix four minor bugs found by Clang r1546 Move deblocking/hpel into sliced threads Instead of doing both as a separate pass, do them during the main encode. This requires disabling deblocking between slices (disable_deblock_idc == 2). Overall performance gain is about 11% on --preset superfast with sliced threads. Doesn't reduce the amount of actual computation done: only better parallelizes it. r1545 Prefetch MB data in cache_load Dramatically reduces L1 cache misses. ~10% faster cache_load. r1544 Fix a ton of pessimization caused by aliasing in cache_save and cache_load r1543 Add CP128/M128 macros using SSE r1542 Fix various early terminations with slices Neighbouring type values (type_top, etc) are now loaded even if the MB isn't available for prediction. Significant overall performance increase (as high as 5-10%+) with lots of slices (e.g. with slice-max-size). r1541 Enable --fast-pskip on fast firstpass r1540 Make interlaced detection in avisynth only apply to field-based input Fixes improper flagging of progressive sources. r1539 Set psy=0 in lossless mode Doesn't actually affect output, just what's written in the SEI. r1538 Fix a use of sad_x4 that had non-mod64 stride Minimal speed improvement, but fixes a violation of internal api. r1537 Make keyint_min auto by default Gives more reasonable default settings when using short GOPs. r1536 Faster mv predictor checking at subme < 3 Simplify the predicted MV cost check. r1535 Special case in qpel refine for subme=1 ~15-20% faster qpel refine with subme=1. Some minor cleanups in refine_supel. r1534 Cosmetics: VLC tables r1533 Add faster mv0 special case for macroblock-tree Improves performance on low-motion video. r1532 Add miscompilation check for x264_clz Running a Phenom-optimized build of x264 (e.g. -march=amdfam10) on a non-Phenom CPU didn't SIGILL; instead it would silently produce incorrect output. Now, instead, it will error out loudly. r1531 Fixing floating-point exception in level-checking Doesn't cause any issues for x264cli, but might impact some calling apps that care (e.g. Delphi apps). r1530 Save a few bits in multislice encoding Set the initial QP for each slice to the last QP of the previous slice. r1529 Early termination in 16x8/8x16 search Combine the actual cost of the first partition with the predicted cost of the second to avoid searching the second when possible. Reduces the number of times the second partition is searched by up to ~75% in non-RD mode, ~10% in RD mode. Negligible effect on compression. r1528 Make MV prediction work across slice boundaries Should improve motion search with lots of small slices, e.g. with slice-max-size. Still restricted by sliced threads (won't cross the boundary between two threadslices). The output-changing part of the previous patch. r1527 Cleanup and simplification of macroblock_load Doesn't do anything now, but will be useful for many future changes. Splitting out neighbour calculation will make MBAFF implementation easier. Calculation of neighbour_frame value (actual neighbouring MBs, ignoring slices) will be useful for some future patches. r1526 Add missing #include to display-x11.c r1525 Add TFF/BFF detection to all demuxers Fix interlaced Avisynth input, automatically weave field-based input. r1524 Correctly mark output frames as BREF Simplify pic_out code. r1523 Fix HRD compliance As usual, the spec is so insanely obfuscated that it's impossible to get things right the first time. r1522 Better b16x8/8x16 early termination in B-frames A bit slower but up to 1-2% better compression. r1521 Fix 10L in B-skip improvement patch r1520 Fix printing of SEI header with VBV + ABR SEI header shouldn't say CBR unless bitrate == maxrate. r1519 Simplify slicetype_frame_cost Avoid redundant calculations when VBV is on (due to the intra-only call). Move most of the logic into per-MB code. r1518 Faster CABAC state copying for small partitions Save ~25 clocks per i4x4, i8x8, and sub8x8 RD call. r1517 Massive cosmetic and syntax cleanup Convert all applicable loops to use C99 loop index syntax. Clean up most inconsistent syntax in ratecontrol.c, visualize, ppc, etc. Replace log(x)/log(2) constructs with log2, and similar with log10. Fix all -Wshadow violations. Fix visualize support. r1516 Fix array overread in b8x16 search r1515 Faster direct check with subpartitions off Also simplify the whole function a bit. r1514 Print crf-max with appropriate precision in SEI r1513 Fix 10l in timecode seeking r1512 Fix 10L: Remove needless error check This error check was for cfr input + --timebase, but that doesn't happen, and brings about a bug with vfr input. r1511 Don't use 2 L1 refs with pyramid + ref=1 Slightly faster encoding with ref=1. r1510 Update copyright year in SEI header r1509 New "superfast" preset, much faster intra analysis Especially at the fastest settings, intra analysis was taking up the majority of MB analysis time. This patch takes a ton more shortcuts at the fastest encoding settings, decreasing compression 0.5-5% but improving speed greatly. Also rearrange the fastest presets a bit: now we have ultrafast, superfast, veryfast, faster. superfast is the old veryfast (but much faster due to this patch). veryfast is between the old veryfast and faster. faster is the same as before except with MB-tree on. Encoding with subme >= 5 should be unaffected by this patch. r1508 Avoid redundant MV prediction in duplicate refs r1507 Cosmetics in mvd handling Use a 2D array instead of doing manual pointer arithmetic. r1506 Fix make uninstall on systems with executable suffixes r1505 Add tune for still image compression There has been some demand for this from companies looking to use x264 for still image compression (it can outperform JPEG or JPEG-2000 by a factor of 2 or more). Still image compression is a bit different; because temporal stability isn't an issue, we can get away with far more powerful psy settings. r1504 Pad non-mod16 resolutions using the correct field Improves compression of interlaced videos with non-mod16 heights. r1503 Document slow/fast firstpass in --fullhelp r1502 Fix some misattributions in profiling Cycles spent in load_hadamard and the avg2 w16 ssse3 cacheline split code were misattributed. r1501 Much faster non-RD intra analysis Since every pred mode costs at least 1 bit, move that part into the initial SATD cost. This lets i4x4/i8x8 analysis terminate earlier. If the cost of the predicted mode is less than the cost of signalling any other mode, early-terminate the analysis. r1500 Fix stack alignment in sliced threads Could cause crashes when called from non-GCC-compiled applications. r1499 Cosmetics: use sizeof() where appropriate r1498 Split up analyse_init Save some time by avoiding some unnecessary inits and moving other parts to per-thread init. r1497 Reduce stack usage of b-adapt 2's trellis Also remove some redundant code. r1496 Various motion estimation optimizations Faster method of checking MV range. Predict MVs and cache MVs/MVDs for bidir qpel-RD. A whole bunch of other minor optimizations. Slightly better performance and compression. r1495 Overhaul macroblock_cache_rect Unify the rectangle functions into a single one similar to ffmpeg's fill_rectangle. Remove all cases of variable-size cache_rect calls; create a function-pointer-based system for handling such cases. Should greatly decrease code size required for such calls. r1494 Make a bunch of small functions ALWAYS_INLINE Probably no real effect for now, but needed for the next patch. r1493 Two compatibility fixes Add IA64 support in configure. r1492 Faster x264_macroblock_encode_pskip GCC is apparently unable to optimize out the calculation of a variable when it isn't used. r1491 Much more accurate B-skip detection at 2 < subme < 7 Use the same method that x264 uses for P-skip detection. This significantly improves quality (1-6%), but at a significant speed cost as well (5-20%). It also may have a very positive visual effect in cases where the inaccurate skip detection resulted in slightly-off vectors in B-frames. This could cause slight blurring or non-smooth motion in low-complexity frames at high quantizers. Not all instances of this problem are solved: the only universal solution is non-locally-optimal mode decision, which x264 does not currently have. subme >= 7 or <= 2 are unaffected. r1490 Reformat profile restrictions in --fullhelp. Put "no interlaced", "no lossless" on their own line to avoid them running into the default options list. r1489 Fix typo in configure r1488 Add support for spaces to iPhone GAS preprocessor script r1487 Fix slightly wrong mp4 duration. r1486 Fix link errors with newest gpac cvs gpac decided to randomly break API and require us to use their own custom malloc and free. r1485 Save a few bits in slice headers Don't override the maximum ref index in the slice header if it's the same as the default. Also update the naming of the relevant variables in the PPS. r1484 Shrink some arrays in x264_t Also remove an unnecessary assignment from cache_load. r1483 Use x264_log in more places instead of fprintf r1482 Fix two nondeterminisms Move noise reduction data into thread-specific data. Use correct reference list for L1 temporal predictors. r1481 "CRF-max" support with VBV This is a rather curious feature that may have more use than is initially obvious. In CRF mode with VBV enabled, CRF-max allows the user to specify a quality level which the encoder will never go below, even due to the effects of VBV. This is not the same as qpmax, which is not aware of issues like scene complexity. Setting this WILL cause VBV underflows in any situation where the encoder would have needed to exceed the relevant CRF to avoid underflow. Why might one want to do this even if it would cause VBV underflows? In the case of streaming, particularly ultra-low-latency streaming, it may be preferable to drop frames than to display frames that are of too low a quality. Thus, in extremely complex scenes, rather than display completely awful video, the streaming server could simply drop to a lower framerate. Scenecuts, which normally look terrible under situations like single-frame VBV, could be handled by just displaying them a bit later and dropping frames to compensate. In other words, it's better to see the scenecut 150ms delayed than for it to look like a blocky mess for 150ms. On the caller-side, this would be handled by detecting the output size of x264's frames and dropping future frames to compensate if necessary. This can also be used in normal encoding simply to ensure that VBV does not hurt quality too much (at the cost of potentially causing underflows). This can help quite a lot when using single-frame VBV and sliced threads, where VBV can often be somewhat unstable. r1480 Blu-ray support: NAL-HRD, VFR ratecontrol, filler, pulldown x264 can now generate Blu-ray-compliant streams for authoring Blu-ray Discs! Compliance tested using Sony BD-ROM Verifier 1.21. Thanks to The Criterion Collection for sponsoring compliance testing! An example command, using constant quality mode, for 1080p24 content: x264 --crf 16 --preset veryslow --tune film --weightp 0 --bframes 3 --nal-hrd vbr --vbv-maxrate 40000 --vbv-bufsize 30000 --level 4.1 --keyint 24 --b-pyramid strict --slices 4 --aud --colorprim "bt709" --transfer "bt709" --colormatrix "bt709" --sar 1:1 <input> -o <output> This command is much more complicated than usual due to the very complicated restrictions the Blu-ray spec has. Most options after "tune" are required by the spec. --weightp 0 is not, but there are known bugged Blu-ray player chipsets (Mediatek, notably) that will decode video with --weightp 1 or 2 incorrectly. Furthermore, note the Blu-ray spec has very strict limitations on allowed resolution/fps combinations. Examples include 1080p @ 24000/1001fps (NTSC FILM) and 720p @ 60000/1001fps. Detailed features introduced in this patch: Full NAL-HRD compliance, with both VBR (no filler) and CBR (filler) modes. Can be enabled with --nal-hrd vbr/cbr. libx264 now returns HRD timing information to the caller in the form of an x264_hrd_t. x264cli doesn't currently use it, but this information is critical for compliant TS muxing. Full VFR ratecontrol support: VBV, 1-pass ABR, and 2-pass modes. This means that, even without knowing the average framerate, x264 can achieve a correct bitrate in target bitrate modes. Note that this changes the statsfile format; first pass encodes make before this patch will have to be re-run. Pulldown support: libx264 allows the calling application to specify a pulldown mode for each frame. This is similar to the way that RFFs (Repeat Field Flags) work in MPEG-2. Note that libx264 does not modify timestamps: it assumes the calling application has set timestamps correctly for pulldown! x264cli contains an example implementation of caller-side pulldown code. Pic_struct support: necessary for pulldown and allows interlaced signalling. Also signal TFF vs BFF with delta_poc_bottom: should significantly improve interlaced compression. --tff and --bff should be preferred to the old --interlaced in order to tell x264 what field order to use. Huge thanks to Alex Giladi and Lamont Alston for their work on code that eventually became part of this patch. r1479 Timecode input/output --tcfile-in allows a user to specify a timecode v1 or v2 file to override input timestamps. Useful for dealing with VFR input, especially when FFMS/LAVF support isn't available. --tcfile-out writes a timecode v2 file containing the timecodes of the output file. New --timebase option allows a user to change the stream timebase. Intended primarily for forcing timebase with timecode files if necessary. When using --seek, note that x264 will seek in the timecode file as well. r1478 Mixed-refs support for B-frames Small speed cost, usually a few percent at most. Generally has lowest cost in cases when it isn't very useful. Up to ~2% better compression overall on highly complex sources. Also fix a few minor bugs in B-frame analysis and various bits of cleanup. r1477 Faster rounding of chroma DC coefficients r1476 Faster cabac_encode_decision_asm Minimizes instruction count, which also means smaller code. Various other slight changes to allow more instruction level parallelism. r1475 Faster hpel_filter On ssse3, use pmaddubsw for h filter too (similar to v filter). Change 32-bit v and c filters to write the result non-temporal. Add commented-out defines to disable non-temporal operation. Hardly any black magic here, but still a measurable win especially for ssse3. r1474 Ignore XYSCSS in y4m if the newer standard C tag is present Apparently y4mscaler will generate 4:2:0 files with XYSCSS set to 444 r1473 Fix regression in r1450 I_PCM blocks would cause x264 to crash or generate bad output. Simplify PCM handling. r1472 Fix crash with intra-refresh + aq-mode 0 r1471 Fix regression in r1453 r1453 broke psy-trellis with --trellis 2 r1470 Fix regression in r1449 Incorrectly placed thread MV check could result in rare thread MV internal errors, esp. with --non-deterministic. These weren't fatal errors (x264 could recover and continue with slight compression loss). r1469 Cut size of MVD arrays by a factor of 2 again Only store the MVDs of the edges of each MB. Thanks to Michael Niedermayer for the idea. r1468 Disable Altivec and VIS optimizations when --disable-asm is specified r1467 Fix a buffer overread on odd input resolutions r1466 Fix one bug, one corner case in VBV qp_novbv wasn't set correctly for B-frames. Disable ABR code for frames with zero complexity. Disable ABR code for CBR mode; it is completely unnecessary and can have negative consequences. r1465 Port Mans Rullgard's NEON intra prediction functions from ffmpeg r1464 Remove unused function Two other minor fixes. r1463 Use short startcode in more possible situations Previous patch didn't cover all possible uses according to B.1.2. r1462 Fix fastfirstpass Apparently the libx264 preset changes made "fastfirstpass" into "fastsecondpass" inadvertantly. r1461 Fix various silly errors in the previous patches r1460 Actually error out if preset/tune/profile is invalid Got lost somewhere in the move to libx264-based presets. r1459 Faster probe_skip, 2x2 DC transform handling Move the 2x2 DC DCT into the dct_dc asm function to avoid some store-to-load forwarding penalties and extra register loads. Use dct_dc as part of the early termination in probe_skip. x86 asm partially by Holger Lubitz. ARM NEON asm by David Conrad. r1458 Use short startcodes whenever possible Saves one byte per frame for every slice beyond the first. Only applies to Annex-B output mode. r1457 New algorithm for AQ mode 2 Combines the auto-ness of AQ2 with a new var^0.25 instead of log(var) formula. Works better with MB-tree than the old AQ mode 2 and should give higher SSIM. r1456 Abide by the MinCR level limit Some Blu-ray analyzers were complaining about this. r1455 Make b-pyramid normal the default Now that b-pyramid works with MB-tree and is spec compliant, there's no real reason not to make it default. Improves compression 0-5% depending on the video. Also allow 0/1/2 to be used as aliases for none/strict/normal (for conciseness). r1454 Move presets, tunings, and profiles into libx264 Now any application calling libx264 can use them. Full documentation and guidelines for usage are included in x264.h. r1453 Faster, more accurate psy-RD caching Keep more variants of cached Hadamard scores and only calculate them when necessary. Results in more calculation, but simpler lookups. Slightly more accurate due to internal rounding in SATD and SA8D functions. r1452 Much faster and more efficient MVD handling Store MV deltas as clipped absolute values. This means CABAC no longer has to calculate absolute values in MV context selection. This also lets us cut the memory spent on MVDs by a factor of 2, speeding up cache_mvd and reducing memory usage by 32*threads*(num macroblocks) bytes. On a Core i7 encoding 1080p, this is about 3 megabytes saved. r1451 Add temporal predictor support to interlaced encoding 0.5-1% better compression in interlaced mode r1450 Keep track of macroblock partitions Allows vastly simpler motion compensation and direct MV calculation. r1449 Much faster and simpler direct spatial calculation r1448 SimpleBlock requires Matroska Doctype v2 r1447 Add GPAC version check r1446 Fix stupid regression in interlaced in r1430 With ref > 8 or b-pyramid, an array over-read could cause slightly incorrect B-frames. r1445 Fix overread of scratch buffer Could cause crashes on non-mod16 frames. r1444 Fix integer overflow in chroma SSD check Could cause bad skips at very high quantizers on extreme inputs. r1443 Fix I and B-frame QPs with threads Rounding errors resulted in slightly wrong QPs with threads enabled. r1442 Fix compilation on ARM r1441 Remove unnecessary PIC support macros yasm has a directive to enable PIC globally r1440 Don't even try direct temporal when it would give junk MVs In PbBbP pyramid structure, the last "b" cannot use temporal because L0Ref0(L1Ref0) != L0Ref0. Don't even bother analyzing it, just use spatial. Should improve speed and direct auto effectiveness in CRF and 1-pass modes when b-pyramid is used. Also makes --direct temporal useful with --b-pyramid, since it will fall back to spatial for frames where temporal is broken. r1439 iPhone compilation support Also add --sysroot to configure options To build for iPhone 3gs / iPod touch 3g: CC=/Developer/Platforms/iPhoneOS.platform/Developer/usr/bin/gcc ./configure --host=arm-apple-darwin --sysroot=/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS3.0.sdk For older devices, add --extra-cflags='-arch armv6 -mcpu=arm1176jzf-s' --extra-ldflags='-arch armv6' --disable-asm r1438 ARM NEON versions of weightp functions r1437 Use #ifdef instead of #if in checkasm r1436 Make the ABR buffer consider the distance to the end of the video Should improve bitrate accuracy in 2-pass mode. May also slightly improve quality by allowing more variation earlier-on in a file. Also fix abr_buffer with 1-pass: it does something very different than what it does for 2-pass. Thus, the earlier change that increased it based on threads caused 1-pass ABR to be somewhat less accurate. r1435 Mark cli_input/output_t variables as const when possible r1434 mkv: Write the x264 version into the file header This only updates the "writing application"; matroska_ebml.c is the "muxing application", but the version string for that is still hardcoded. r1433 mkv: Write SimpleBlock instead of Block for frame headers mkvtoolnix writes these by default since 2009/04/13. Slightly simplifies muxer and allows 'mkvinfo -s' to show B-frames as 'B' (but not B-ref frames). r1432 Allow | as a separator between psy-rd and psy-trellis values. [,:/] are all taken when setting psy-trellis in a zone in an mencoder option. Also fix a comment typo and remove a useless line of code. r1431 Backport various speed tweak ideas from ffmpeg Add mv0 early termination to spatial direct calculation Up to twice as fast direct mv calculation on near-motionless video. Branchless CAVLC level code adjustment based on trailing ones. A few clocks faster. Check tc value before clipping in C version of deblock functions. Much faster, but nobody uses those anyways. Thanks to Michael Niedermayer for the ideas. r1430 Implement direct temporal + interlaced This was much easier than I expected. It will also be basically useless until TFF/BFF support gets in, since it requires delta_poc_bottom to be set correctly to work well. r1429 Allow longer keyints with intra refresh If a long keyint is specified (longer than macroblock width-1), the refresh will simply not occur all the time. In other words, a refresh will take place, and then x264 will wait until keyint is over to start another refresh. r1428 Overhaul sliced-threads VBV Make predictors thread-local and allow each thread to poll the others to get their predicted sizes. Many, many other tweaks to improve quality with small VBV and sliced threads. Note this may somewhat increase the risk of a VBV underflow in such extreme situations (single-frame VBV). This is tolerable, as most relevant use-cases are better off with a few rare underflows (even if they have to drop a slice) than consistent low quality. r1427 Print psy-(rd|trellis) with more precision in userdata SEI r1426 More formatting fixes in x264 help r1425 Faster 2x2 chroma DC dequant r1424 Write PASP atom in mp4 muxing Adds container-level aspect ratio support for mp4. r1423 Fix 2-pass ratecontrol continuation in case of missing statsfile Didn't work properly if MB-tree was enabled. r1422 Smarter QPRD Catch some cases in which RD checks can be avoided; reduces QPRD RD calls by 10-20%. r1421 Fix subpel iteration counts with B-frame analysis and subme 6/8 Since subme 6 means "like subme 5, except RD on P-frames", B-frame analysis shouldn't use the RD subpel counts at subme 6.Similarly with subme 8. Slightly faster (and very marginally worse) compression at subme 6 and 8. r1420 Simplify decimate checks in macroblock_encode Also fix a misleading comment. r1419 Improve bidir search, fix some artifacts in fades Modify analysis to allow bidir to use different motion vectors than L0/L1. Always try the <0,0,0,0> motion vector for bidir. Eliminates almost all errant motion vectors in fades. Slightly improves PSNR as well (~0.015db). r1418 Slightly faster predictor_difference_mmxext r1417 Add ability to adjust ratecontrol parameters on the fly encoder_reconfig and x264_picture_t->param can now be used to change ratecontrol parameters. This is extraordinarily useful in certain streaming situations where the encoder needs to adapt the bitrate to network circumstances. What can be changed: 1) CRF can be adjusted if in CRF mode. 2) VBV maxrate and bufsize can be adjusted if in VBV mode. 3) Bitrate can be adjusted if in CBR mode. However, x264 cannot switch between modes and cannot change bitrate in ABR mode. Also fix a bug where x264_picture_t->param reconfig method would not always be frame-exact. Commit sponsored by SayMama video calling. r1416 Fix regression in r1406 Bitrate was printed incorrectly for some input framerates. r1415 Fix log2f detection, include order, some gcc warnings r1413 caused crashes on any system with malloc.h. Also switch to std=c99 or std=gnu99 if supported by the compiler. Fix visualize support. r1414 Fix abstraction violations in x264.c No calling application--not even x264cli--should ever look inside x264_t. r1413 Move -D CFLAGS to config.h r1412 Fix stat with large file support r1411 Implement ffms2 version check Depends on ffms2 version 2.13.1 (r272). Tries pkg-config's built-in version checking first. Uses only the preprocessor to avoid cross-compilation issues. r1410 Fix implicit CBR message to only print when in ABR mode Also make it print outside of debug mode. r1409 Add configure check for log2 support Some incredibly braindamaged operating systems, such as FreeBSD, blatantly ignore the C specification and omit certain functions that are required by ISO C. log2f is one of these functions that periodically goes missing in such operating systems. r1408 Add config.log support Now, if configure fails, you'll be able to see why. r1407 Fix cross-compiling with lavf, add support for ffms2.pc Also update configure script to work with newest ffms. r1406 Improve DTS generation, move DTS compression into libx264 This change fixes some cases in which PTS could be less than DTS. Additionally, a new parameter, b_dts_compress, enables DTS compression. DTS compression eliminates negative DTS (i.e. initial delay) due to B-frames. The algorithm changes timebase in order to avoid duplicating DTS. Currently, in x264cli, only the FLV muxer uses it.The MP4 muxer doesn't need it, as it uses an EditBox instead. r1405 Various threading-related cosmetics Simplify a lot of code and remove some unnecessary variables. r1404 Hardcode the bs_t in cavlc.c; passing it around is a waste Saves ~1.5kb of code size, very slight speed boost. r1403 Fix lavf input with pipes and image sequences x264 should now be able to encode from an image sequence using an image2-style formatted string (e.g. file%02d.jpg). r1402 Fix bitstream alignment with multiple slices Broke multi-slice encoding on CPUs without unaligned access. New system simply forces a bitstream realignment at the start of each writing function and flushes when it reaches the end. r1401 Merge nnz_backup with scratch buffer Slightly less memory usage. r1400 Use cross-prefix properly with pkg-config for cross-compiling r1399 Various performance optimizations Simplify and compact storage of direct motion vectors, faster --direct auto. Shrink various arrays to save a bit of cache. Simplify and reorganize B macroblock type writing in CABAC. Add some missing ALIGNED macros. r1398 Fix crash on new AMD M300 and similar CPUs Apparently these CPUs have SSE4a, but not misaligned SSE. r1397 Fix intra refresh with subme < 6 Also improve the quality of intra masking. r1396 Add support for multiple --tune options Tunes apply in the order they are listed in the case of conflicts. Psy tunings, i.e. film/animation/grain/psnr/ssim, cannot be combined. Also clarify --profile, which forces the limits of a profile, not the profile itself. r1395 Various bugfixes and tweaks in analysis Fix the oldest-ever bug in x264: b16x8 analysis used the wrong width for predict_mv. Fix cache_ref calls for slightly better MV prediction in bsub16x16 analysis. Make B-partition analysis consider reference frame costs. Various other minor changes. Overall very slightly improved mode decision and motion search in B-frames. r1394 More --me tesa optimizations r1393 Fix typo in configure r1392 Make --fps force CFR mode r1391 Eliminate intentional array overflow in quant matrix handling While it probably never caused problems, it was incredibly ugly and evil. r1390 Faster --me tesa r1389 Fix static pthreads + dynamically linked x264 on win32 Add the necessary static pthread initialization code to a new DLLmain function. r1388 Add getopt_long to the included getopt.c Fixes option handling on OSs that have a nonworking/missing getopt (e.g. Solaris). r1387 Faster psy-trellis init Remove some unncessary zigzags. r1386 Simplfy intra mode availability handling Slightly faster, 1.5kb smaller binary size, less code. r1385 Fix free callback, add x264_encoder_parameters function x264 would try to use the passed param struct after freeing if the param_free callback was set. Probably didn't cause any issues, as probably no programs used the callback in this location yet. A new x264_encoder_parameters function is now available in the API. This function lets the calling application grab the current state of the encoder's parameters. Use this in x264cli to ensure that the param struct used for set_param is updated with whatever changes x264_encoder_open has made to it. Patch partially by Anton Mitrofanov <BugMaster@narod.ru>. r1384 Fix x264 compilation on Apple GCC Apple's GCC stupidly ignores the ARM ABI and doesn't give any stack alignment beyond 4. r1383 Faster weightp motion search For blind-weight dupes, copy the motion vector from the main search and qpel-refine instead of doing a full search. Fix the p8x8 early termination, which had unexpected results when combined with blind weighting. Overall, marginally reduces compression but should potentially improve speed by over 5%. r1382 More correct padding constants for lowres planes Since lowres analysis isn't interlace-aware, we don't need to double the vertical padding for interlaced video. r1381 Fix some invalid reads caught by valgrind Temporal predictor calculation was misled by invalid reference counts for I-frames. r1380 Periodic intra refresh Uses SEI recovery points, a moving vertical "bar" of intra blocks, and motion vector restrictions to eliminate keyframes. Attempt to hide the visual appearance of the intra bar when --no-psy isn't set. Enabled with --intra-refresh. The refresh interval is controlled using keyint, but won't exceed the number of macroblock columns in the frame. Greatly benefits low-latency streaming by making it possible to achieve constant framesize without intra-only encoding. Combined with slice-max size for one slice per packet, tests suggest effective resiliance against packet loss as high as 25%. x264 is now the best free software low-latency video encoder in the world. Accordingly, change the API to add b_keyframe to the parameters present in output pictures. Calling applications should check this to see if a frame is seekable, not the frame type. Also make x264's motion estimation strictly abide by horizontal MV range limits in order for PIR to work. Also fix a major bug in sliced-threads VBV handling. Also change "auto" threads for sliced threads to "cores" instead of "1.5*cores" after performance testing. Also simplify ratecontrol's checking of first pass options. Also some minor tweaks to row-based VBV that should improve VBV accuracy on small frames. r1379 LAVF/FFMS input support, native VFR timestamp handling libx264 now takes three new API parameters. b_vfr_input tells x264 whether or not the input is VFR, and is 1 by default. i_timebase_num and i_timebase_den pass the timebase to x264. x264_picture_t now returns the DTS of each frame: the calling app need not calculate it anymore. Add libavformat and FFMS2 input support: requires libav* and ffms2 libraries respectively. FFMS2 is _STRONGLY_ preferred over libavformat: we encourage all distributions to compile with FFMS2 support if at all possible. FFMS2 can be found at http://code.google.com/p/ffmpegsource/. --index, a new x264cli option, allows the user to store (or load) an FFMS2 index file for future use, to avoid re-indexing in the future. Overhaul the muxers to pass through timestamps instead of assuming CFR. Also overhaul muxers to correctly use b_annexb and b_repeat_headers to simplify the code. Remove VFW input support, since it's now pretty much redundant with native AVS support and LAVF support. Finally, overhaul a large part of the x264cli internals. --force-cfr, a new x264cli option, allows the user to force the old method of timestamp handling.May be useful in case of a source with broken timestamps. Avisynth, YUV, and Y4M input are all still CFR.LAVF or FFMS2 must be used for VFR support. Do note that this patch does *not* add VFR ratecontrol yet. Support for telecined input is also somewhat dubious at the moment. Large parts of this patch by Mike Gurlitz <mike.gurlitz@gmail.com>, Steven Walters <kemuri9@gmail.com>, and Yusuke Nakamura <muken.the.vfrmaniac@gmail.com>. r1378 More help typo fixes r1377 Fix x264_clz on inputs > 1<<31 (though x264 never generates such inputs) r1376 Don't do sum/ssd analysis if weightp == 1 Typo fixes in comments and help. r1375 Fix two bugs in 2-pass ratecontrol last_qscale_for wasn't set during the 2pass init code. abr_buffer was way too small in the case of multiple threads, so accordingly increase its buffer size based on the number of threads. May significantly increase quality with many threads in 2-pass mode, especially in cases with extremely large I-frames, such as anime. r1374 Avisynth-MT and 2.6 compatibility fixes Explain to the user why YV12 conversion is forced with Avisynth 2.6. Fix encoding with Avisynth-MT scripts by inserting the necessary Distributor() call; speeds such scripts back up to expected levels. r1373 Fix zone parsing on mingw Due to MinGW evidently being in the hands of a pack of phenomenal idiots, MinGW does not have strtok_r, a basic string function. As such, remove the dependency on strtok_r in zone parsing. Previously, using zones for anything other than ratecontrol failed. r1372 More lookahead optimizations Under subme 1, don't do any qpel search at all and round temporal MVs accordingly. Drop internal subme with subme 1 to do fullpel predictor checks only. Other minor optimizations. r1371 missing changes from previous commits r1370 Fix regression in direct=auto/temporal in r1364 Bug caused rare race condition in frame reference handling. This resulted in invalid bitstreams in some B-frames and, very rarely, crashes. r1369 Add fast pskip to x264 SEI info header r1368 Minor seeking fix with Avisynth input Seeking past the end of the input with --seek would result in the same frame being repeated over and over. r1367 Add support for MB-tree + B-pyramid Modify B-adapt 2 to consider pyramid in its calculations. Generally results in many more B-frames being used when pyramid is on. Modify MB-tree statsfile reading to handle the reordering necessary. Make differing keyint or pyramid between passes into a fatal error. r1366 Use aliasing-avoidance macros in array_non_zero r1365 MMX version of 8x8 interlaced zigzag Just as fast as SSSE3 on Nehalem (and faster on Conroe/Penryn), so remove the SSSE3 version. r1364 Bring back slice-based threading support Enabled with --sliced-threads Unlike normal threading, adds no encoding latency. Less efficient than normal threading, both performance and compression-wise. Useful for low-latency encoding environments where performance is still important, such as HD videoconferencing. Add --tune zerolatency, which eliminates all x264 encoder-side latency (no delayed frames at all). Some tweaks to VBV ratecontrol and lookahead (in addition to those required by sliced threading). by a media streaming company that wishes to remain anonymous. :Mon Dec 7 18:17:29 2009 -0800 Add more detailed help for presets/tunes/profiles Shows what options they represent. r1362 qpel RD no longer needs mbcmp_unaligned r1361 ensure that all boolean options are {0,1} so they print consistently in the options SEI r1360 Actually do r1356 Somehow commit r1356 got lost in the ether.I'm not sure how, but now it's fixed. r1359 Remove some unused code from x264.c r1358 SSSE3 version of zigzag_8x8_field Slightly faster interlaced encoding with 8x8dct. Helps most on Nehalem, somewhat disappointing on Conroe/Penryn. r1357 Fix crash in interlaced with >8 refs Crash introduced in weightp. r1356 Significantly faster qpel-RD Cache the results of MC, like in bidir-RD. Slightly changes output due to the necessary reordering of satd/RD calls. 5-10% faster qpel-RD. r1355 Add x264 prefix to functions with ffmpeg equivalents Not important now, but will be when we add libav* input support. r1354 10L in r1353 Broke mp4 output. r1353 Enhanced Avisynth input support Requires avisynth_c.h from the Avisynth API headers. Reports errors properly from Avisynth script input. Automatically construct input scripts for almost any input file. Tries ffmpegsource2, DSS2, directshowsource, and many other sourcing methods, based on the input file extension. Automatically converts to YV12. r1352 Much faster weightp Move sum/ssd calculation out of lookahead and do it only once per frame. Also various minor optimizations, cosmetics, and cleanups. r1351 Fix bugs in fps/timestamp handling in FLV muxer r1350 Fix bug in weightp analysis Weights weren't reset upon early terminations, so old (wrong) weights could stick around. Small compression improvement. r1349 Minor deblocking optimization, update comments r1348 Fix weightb with delta_poc_bottom Has no effect yet, but will be required once we add TFF/BFF signalling support in interlaced mode. Gives 0.5-0.7% better compression with proper TFF/BFF signalling. r1347 Give more meaningful error if 1st/2nd pass resolution differ r1346 Fix extremely rare deadlock with sync-lookahead Patch partially by Anton Mitrofanov. r1345 Only print weightp stats if there were P-frames r1344 Faster lookahead with subme=1 If it hasn't been clear already, don't use subme=1 as a "fast first pass" option. Use subme=2 instead; 1 and below now enable a fast (lower quality) lookahead mode. r1343 Faster weightp analysis Modify pixel_var slightly to return the necessary information and use it for weight analysis instead of sad/ssd. Various minor cosmetics. r1342 Fix two issues in weightp If analysis decided on an offset of -128, x264 would create non-compliant streams. Fix some cases with nearly all intra blocks where analysis could pick very weird weights. Also add some asserts to check compliancy. r1341 Allow compilation with non-Apple GCC on OS X r1340 Use __attribute__((may_alias)) for type-punning GCC thinks pointer casts to unions aren't valid with strict aliasing. See http://gcc.gnu.org/onlinedocs/gcc-4.4.2/gcc/Optimize-Options.html#Type_002dpunning. Also use M32() in y4m.c. Enable -Wstrict-aliasing again since all such warnings are fixed. r1339 100l in deadlock fix r1338 FLV muxing support r1337 Fix rare deadlock introduced in weightp r1336 Actually add -Wno-strict-aliasing to configure r1335 Various weightp fixes Make weightp results match in threaded vs non-threaded mode. Fix two-pass with slow-firstpass. r1334 Fix all aliasing violations New type-punning macros perform write/read-combining without aliasing violations per the second-to-last part of 6.5.7 in the C99 specification. GCC 4.4, however, doesn't seem to have read this part of the spec and still warns about the violations. Regardless, it seems to fix all known aliasing miscompilations, so perhaps the GCC warning generator is just broken. As such, add -Wno-strict-aliasing to CFLAGS. r1333 Fix 10l in weightp on ARM r1332 Fix one (of possibly many) miscompilations in weightp Use NOINLINE and some emms calls to fix emms reordering issues. This issue occurred with some GCC versions if threads > 1 and the phase of the moon was right. Also a cosmetic in x264.c. r1331 Fix pixel_ssd on win64 Didn't preserve XMM registers, may or may not have caused problems. r1330 Fix weightp logfile parsing on MinGW r1329 cosmetics r1328 Fix weightp on ARM + PPC No ARM or PPC assembly yet though. r1327 Weighted P-frame prediction Merge Dylan's Google Summer of Code 2009 tree. Detect fades and use weighted prediction to improve compression and quality. "Blind" mode provides a small overall quality increase by using a -1 offset without doing any analysis, as described in JVT-AB033. "Smart", the default mode, also performs fade detection and decides weights accordingly. MB-tree takes into account the effects of "smart" analysis in lookahead, even further improving quality in fades. If psy is on, mbtree is on, interlaced is off, and weightp is off, fade detection will still be performed. However, it will be used to adjust quality instead of create actual weights. This will improve quality in fades when encoding in Baseline profile. Doesn't add support for interlaced encoding with weightp yet. Only adds support for luma weights, not chroma weights. Internal code for chroma weights is in, but there's no analysis yet. Baseline profile requires that weightp be off. All weightp modes may cause minor breakage in non-compliant decoders that take shortcuts in deblocking reference frame checks. "Smart" may cause serious breakage in non-compliant decoders that take shortcuts in handling of duplicate reference frames. Thanks to Google for sponsoring our most successful Summer of Code yet! r1326 Fix assert failure in the case of forced i-frames Note that this applies to non-IDR i-frames, not IDR-frames. This fix is also required for future open-gop. r1325 Fix issues relating to input/output files being pipes/FIFOs r1324 Various ARM-related fixes Fix comment for mc_copy_neon. Fix memzero_aligned_neon prototype. Update NEON (i)dct_dc prototypes. Duplicate x86 behavior for global+hidden functions. r1323 Fix miscompilation with gcc 4.3 on ARM Aliasing violation in spatial prediction caused nasty artifacts. Shut up two other GCC warnings while we're at it. r1322 Fix extremely rare infinite loop in 2-pass VBV Implicit conversion from double->float lost enough precision to cause the loop termination condition to never trigger. Bug report by Tal Aloni. r1321 Fix large file support, broken in r1302 r1320 Dramatically reduce size of pixel_ssd_* asm functions ~10k of code size eliminated. r1319 fix bottom-right pixel of lowres planes, which was uninitialized. weirdly, valgrind reported this only with --no-asm. r1310 cosmetics r1309 ISC-license x86inc.asm As the assembly abstraction layer is very useful in non-x264 projects, it is now ISC (simplified BSD) so that others, even in commercial projects, can use it as well. r1308 Various minor CABAC optimizations r1307 Fix bug in b-pyramid strict Bug caused invalid streams in some situations. r1306 Remove non-mod16 warning Compression only "suffers" by an extremely marginal amount and too many people misinterpret the warning. r1305 Fix two warnings + some minor optimizations r1304 Fix a typo in b-pyramid help And an errant space in common/macroblock.c r1303 A bit more write-combining in macroblock_cache_load r1302 split muxers.c into one file per format simplify internal muxer API r1301 Update fprofile with the latest change to b-pyramid r1300 Fix assertion fail and incorrect costs with pyramid+VBV Deal properly with QPfile'd B-refs.x264 should handle multiple B-refs per minigop now, though only via forced frametypes. r1299 Improve CRF initial QP selection, fix get_qscale bug If qcomp=1 (as in mb-tree), we don't need ABR_INIT_QP. get_qscale could give slightly weird results with still images r1298 Print more accurate error message if dump_yuv fails r1297 Reduce memory usage of b-adapt 2 trellis Also fix a minor bug where the algorithm ignored the last frame in the trellis. r1296 Make B-pyramid spec-compliant The rules of the specification with regard to picture buffering for pyramid coding are widely ignored. x264's b-pyramid implementation, despite being practically identical to that proposed by the original paper, was technically not compliant. Now it is. Two modes are now available: 1) strict b-pyramid, while worse for compression, follows the rule mandated by Blu-ray (no P-frames can reference B-frames) 2) normal b-pyramid, which is like the old mode except fully compliant. This patch also adds MMCO support (necessary for compliant pyramid in some cases). MB-tree still doesn't support b-pyramid (but will soon). r1295 Add missing free for nal_buffer Fixes a memory leak. r1294 sync yasm macros to ffmpeg r1293 eliminate some divisions r1292 Fix glitches with slow-firstpass + weightb + multiref + 2pass Bug in r1277 r1291 Simplify some code in b-adapt 2's trellis r1290 Fix a very rare integer overflow in slicetype analysis Caused an assert failure when it occurred. Bug is as old as adaptive B-frames. r1289 Reduce the aggressiveness of 2-pass VBV Now that B-frames are properly covered, we don't have to be as aggressive. This eliminates some issues with skyrocketing QPs in B-frames in 2-pass VBV. r1288 Fix regression: disable flash detection without B-frames r1287 change all dct arrays to 1d. the C standard doesn't allow you to iterate 1-dimensionally over 2d arrays, and nothing other than the dsp functions themselves cares about the 2dness of dct. this fixes a miscompilation in x264_mb_optimize_chroma_dc. r1286 Add row-based VBV for B-frames While B-frames still aren't explicitly covered by ratecontrol, this should resolve issues of VBV underflows due to larger-than-expected B-frames. r1285 Improve VBV, fix bug in 2-pass VBV introduced in MB-tree Bug caused AQ'd row/frame costs to not be calculated (and thus caused underflows). Also make VBV more aggressive with more threads in 2-pass mode. Finally, --ratetol now affects VBV aggressiveness (higher is less aggressive). r1284 Optimize exp2fix8 Slightly faster and more accurate rounding. r1283 Avoid scenecuts in flashes and similar situations "Flashes" are defined as any scene which lasts a very short period before a previous scene returns. A common example of this is of course a camera flash. Accordingly, look ahead during scenecut analysis and rule out the possibility of certain frames being scenecuts. Also handles cases of tons of short scenes in sequence and avoids making those scenecuts as well. Can only catch flashes of 1 frame in length with b-adapt 1. With b-adapt 2, can catch flashes of length --bframes. Speed cost should be negligible. r1282 Fix bug where x264 generated non-compliant bitstreams with insane SAR values r1281 rm msvc project files and related ifdefs r1280 SSE4 version of 4x4 idct 27->24 clocks on Nehalem. This is really just an excuse to use "movsd" in a real function. Add some comments to subsum-related macros in x86util. r1279 Constrained intra prediction support Enable with --constrained-intra.Significantly reduces compression, but required for the base layer of SVC encodes and maybe some other use-cases. Commit sponsored by a media streaming company that wishes to remain anonymous. r1278 Slightly improve non-RD p8x8 mode decision Subpartition costs are effectively zero in CABAC if sub-8x8 search is off. r1277 Reorder reference frames optimally on second pass About +0.1-0.2% compression at normal bitrates, up to +1% at very low bitrates. Only works if the first pass uses the same number of refs as the second (i.e. not with fast first pass). Thus, only worthwhile at insanely slow speeds: as such, enable slow-firstpass by default with preset placebo. Note that this changes the stats file format! r1276 Fix typo in ratecontrol_summary r1275 Clip log2_max_frame_num It's still much higher than it needs to be, but that will be fixed with the upcoming MMCO patch. Also make sure we don't write too large a frame_num or poc in slice header. r1274 Fix some issues with 3-pass statsfile handling The value of i_frame during encoder_close was incorrect. r1273 Fix ctrl-C termation message with few frames encoded r1272 Add support for single-frame VBV, improve compliance This allows both constant-framesize and capped-framesize encoding. Literal constant framesize isn't actually supported yet due to the lack of filler support. Example with 30fps video: --vbv-bufsize 200 --vbv-maxrate 6000 will ensure that no frame is ever larger than 200 kilobits. One example use-case of this is for zero-delay streaming where bandwidth costs need to be minimized.If every frame is smaller than 200 kilobits and the client has a 6 megabit connection, every single frame can be instantly sent to the client and handled without any decoder-side buffer. Fix a mistake in VBV calculation--this may have caused the VBV to be slightly non-compliant in some situations without x264 realizing it. Add primitive prediction handling for rows with quantizers lower than their reference.This slightly improves VBV in CBR mode. Various other minor improvements to VBV, mostly to make single-frame VBV work. Commit sponsored by a media streaming company that wishes to remain anonymous. r1271 Fix 10l in API change frame_num was set to 1, not 0, for the first frame.This broke spec compliance. Didn't actually seem to cause any problems though except for breaking decoding on Quicktime. r1270 Allow user-set FPS for inputs other than YUV r1269 Improve threaded frame handling Avoid unnecessary cond_wait r1268 Attempt to detect miscompilation due to bug in gcc 4.2 I don't know if this bug still affects latest x264, but it can't hurt to try to detect it. Accordingly refuse to open the encoder if detected. Apparently VLC (on Windows) has been distributed for some time with a completely broken x264 due to the use of a completely broken compiler (gcc 4.2).In particular, the MV costs seem to be calculated incorrectly on win32 when linking from an application compiled without -ffast-math to an application with -ffast-math. I am not entirely certain why this occurs, but the result is, unsurprisingly, encoding quality that makes MPEG-2 look good, due to the motion search being completely broken. r1267 Really fix encoder_close crash this time Not-entirely-fixed in r1253. r1266 Check for 16x16 partitions masquerading as smaller ones Saves a few bits when using qpel-RD. r1265 Update config.guess/sub; add Snow Leopard support r1264 Fix integer overflow in 2-pass VBV Bug caused slight undersizing in 2-pass mode in some cases. r1263 Fix bug with various bizarre commandline combinations and mbtree Second pass would have mbtree on even though the first pass didn't (and thus encoding would immediately fail). r1262 Add intra prediction modes to output stats Also eliminate some NANs in stat output with intra-only encoding. Marginal speedup: disable stat calculation if log level is below X264_LOG_INFO. Various minor cosmetics. r1261 Overhaul syntax in muxers.c/matroska.c The inconsistent syntax in these files has finally come to an end. r1260 Major API change: encapsulate NALs within libx264 libx264 now returns NAL units instead of raw data.x264_nal_encode is no longer a public function. See x264.h for full documentation of changes. New parameter: b_annexb, on by default.If disabled, startcodes are replaced by sizes as in mp4. x264's VBV now works on a NAL level, taking into account escape codes. VBV will also take into account the bit cost of SPS/PPS, but only if b_repeat_headers is set. Add an overhead tracking system to VBV to better predict the constant overhead of frames (headers, NALU overhead, etc). r1259 Add missing fclose for mbtree input statsfile on second pass Bug report by VFRmaniac r1258 Improve progress indicator behavior Progress indicator will now indicate based on output frame, not input frame. r1257 Update yasm configure check lzcnt apparently requires yasm 0.6.2. r1256 Make MV costs global instead of static Fixes some extremely rare threading race conditions and makes the code cleaner. Downside: slightly higher memory usage when calling multiple encoders from the same application. r1255 Don't print scenecut message multiple times in verbose mode Occurred mostly with b-adapt 2. r1254 Optimize rounding of luma and chroma DC coefficients Reduce bitrate mostly-losslessly at low quantizers. In some rare cases, bitrate reduction may be as high as 10%. Luma rounding optimization (helps much less than chroma) requires trellis. r1253 Fix crash if encoder_close is called before delayed frames are flushed Also no longer flush frames when ctrl-Cing x264, so x264 will close faster. r1252 Improve x264 help Now has three help options: --help, --longhelp, and --fullhelp. --help only shows the most basic options; most users should not need more than these. Add usage examples. Fix typo in a comment. r1251 Factor out a redundant RD call in qpel-RD Fixes a problem that was supposed to be, but didn't, get fully fixed in r1238. r1250 Fix RD early-skip Small quality improvement and speedup, was broken by r1214. r1249 Faster CAVLC mb header writing for B macroblocks r1248 Compile fixes for pre-ARMv6T2 and/or PIC r1247 Change priority handling on some OSs Instead of setting the lookahead thread to max priority, lower all the other threads' priorities instead. This is particularly useful when the "max priority" is "realtime", as in Windows, which can cause some problems. r1246 Threaded lookahead Move lookahead into a separate thread, set to higher priority than the other threads, for optimal performance. Reduces the amount that lookahead bottlenecks encoding, greatly increasing performance with lookahead-intensive settings (e.g. b-adapt 2) on many-core CPUs. Buffer size can be controlled with --sync-lookahead, which defaults to auto (threads+bframes buffer size). Note that this buffer is separate from the rc-lookahead value. Note also that this does not split lookahead itself into multiple threads yet; this may be added in the future. Additionally, split frames into "fdec" and "fenc" frame types and keep the two separate. This split greatly reduces memory usage, which helps compensate for the larger lookahead size. Extremely special thanks to Michael Kazmier and Alex Giladi of Avail Media, the original authors of this patch. r1245 Force a link error in case of incompatible API This is because the number of bug reports due to miscompiled ffmpeg builds is reaching critical mass. The name of x264_encoder_open is now #defined based on the current X264_BUILD. Note that this changes the calling convention required for dlopen, but not for ordinary calls to x264_encoder_open. r1244 Get rid of "CBR" descriptor from qcomp Though technically accurate in some vague way, I have never actually seen this option used correctly, rather it has been used by hundreds of people who can't read the documentation and believe that qcomp=0 is what should be used for CBR encoding. r1243 Faster me=tesa But it still spends all too much time in me_search_ref rather than asm. r1242 Multi-slice encoding support Slicing support is available through three methods (which can be mixed): --slices sets a number of slices per frame and ensures rectangular slices (required for Blu-ray).Overridden by either of the following options: --slice-max-mbs sets a maximum number of macroblocks per slice. --slice-max-size sets a maximum slice size, in bytes (includes NAL overhead). Implement macroblock re-encoding support to allow highly accurate slice size limitation.Might be useful for other things in the future, too. r1241 Fix a valgrind warning in b-adapt 2 r1240 fix asm symbols for oprofile (regression in r1221) r1239 Fix bug in intra analysis in B-frames i8x8/i4x4 never got analysed when fast_intra was toggled and RD was off; up to a 2-3% quality improvement in non-RD mode. With this bug dating back to r369, this is probably the second-oldest bug ever fixed in x264. r1238 Fix bug in b16x16 qpel RD Incorrect cost was used to initialize the search. r1237 Check minimum chroma QP in addition to luma QP during CQM init Correctly error out if the implied minimum chroma QP is too low. Add missing emms to checkasm macroblock_tree_propagate test. r1236 Faster mbtree propagate and x264_log2, less memory usage Avoid an int->float conversion with a small table. Change lowres_inter_types to a bitfield; cut its size by 75%. Somewhat lower memory usage with lots of bframes. Make log2/exp2 tables global to avoid duplication. r1235 Fix keyint=1 + VBV + rc-lookahead r1234 Faster x264_exp2fix8 22->13 cycles on Core 2 with mfpmath=sse r1233 compile x86 with fpmath=sse by default r1232 ARM configure: enable NEON-related options by default When compiling for ARM, x264 will compile by default for Cortex A8 unless specified otherwise. To compile for pre-ARMv6, --disable-asm is required. r1231 2-pass VBV fixes Properly run slicetype frame cost with 2pass + MB-tree. Slash the VBV rate tolerance in 2-pass mode; increasing it made sense for the highly reactive 1-pass VBV algorithm, but not for 2-pass. 2-pass's planned frame sizes are guaranteed to be reasonable, since they are based on a real first pass, while 1-pass's, based on lookahead SATD, cannot always be trusted. r1230 GSOC merge part 8: ARM NEON intra prediction assembly functions (partial) 4x4 dc/h/ddr/ddl, 8x8 dc/h, 8x8c h/v, 16x16 dc/h/v r1229 GSOC merge part 7: ARM NEON deblock assembly functions (partial) Originally written for ffmpeg by Mans Rullgard; ported by David. Luma and chroma inter deblocking; no intra yet. r1228 GSOC merge part 6: ARM NEON quant assembly functions (partial) (de)quant 4x4, (de)quant 8x8, (de)quant DC, coeff_last r1227 GSOC merge part 5: ARM NEON dct assembly functions (i)dct4x4dc, (i)dct4x4, (i)dct8x8, (i)dct_dc, zigzag_scan_frame_4x4 r1226 GSOC merge part 4: ARM NEON mc assembly functions prefetch, memcpy_aligned, memzero_aligned, avg, mc_luma, get_ref, mc_chroma, hpel_filter, frame_init_lowres r1225 GSOC merge part 3: ARM NEON pixel assembly functions SAD, SADX3/X4, SSD, SATD, SA8D, Hadamard_AC, VAR, VAR2, SSIM r1224 GSOC merge part 2: ARM stack alignment Neither GCC nor ARMCC support 16 byte stack alignment despite the fact that NEON loads require it. These macros only work for arrays, but fortunately that covers almost all instances of stack alignment in x264. r1223 Fix unaligned accesses in bitstream writer Fixes x264 on CPUs with no unaligned access support (e.g. SPARC). Improves performance marginally on CPUs with penalties for unaligned stores (e.g. some x86). r1222 Fix bug in calculation of I-frame costs with AQ. r1221 GSOC merge part 1: Framework for ARM assembly optimizations x264 will detect which ARM core it's building for and only build NEON asm if the target is ARMv6 or above, then enable NEON at runtime. r1220 Fix a bug in checkasm and two OSX fixes MC chroma checkasm test could crash in some situations Remove -lmx, as it's not needed and the iPhone doesn't have it. Remove unused sqrtf emulation; it breaks if math.h is included. r1219 Improve QPRD Always check the last macroblock's QP, even if the normal search doesn't reach it. Raise the failure threshold when moving towards the last macroblock's QP. 0.2-1% improved compression. r1218 Fix MB-tree with keyint<3 Also slightly improve VBV keyint handling. r1217 Fix bug in VBV lookahead + no MB-tree I-frames need to have VBV lookahead run on them as well. r1216 Add support for frame-accurate parameter changes Parameter structs can now be passed with individual frames. The previous method would only change the parameter of what was currently being encoded, which due to delay might be very far from an intended exact frame. Also add support for changing aspect ratio.Only works in a stream with repeating headers and requires the caller to force an IDR to ensure instant effect. r1215 Fix x264_encoder_reconfig with multithreading New behavior: reconfigging the encoder will result in changes being applied to each of the encoding threads as they finish encoding the current frame. r1214 Fix two bugs in QPRD QPRD could in some cases force blocks to skip when they shouldn't be ~(+0.01db) Force QPRD to abide by qpmin/qpmax restrictions. r1213 Lookahead VBV Use the large-scale lookahead capability introduced in MB-tree for ratecontrol purposes. (Does not require MB-tree, however.) Greatly improved quality and compliance in 1-pass VBV mode, especially in CBR; +2db OPSNR or more in some cases. Fix some other bugs in VBV, which should improve non-lookahead mode as well. Change the tolerance algorithm in row VBV to allow for more significant mispredictions when buffer is nearly full. Note that due to the fixing of an extremely long-standing bug (>1 year), bitrates may change by nontrivial amounts in CRF without MB-tree. r1212 Fix bug in b-adapt 1 B-adapt 1 didn't use more than MAX(1,bframes-1) B-frames when MB-tree was off. r1211 Fix a potential failure in VBV If VBV does underflow, ratecontrol could be permanently broken for the rest of the clip. Revert part of the previous VBV changes to fix this. r1210 new API function x264_encoder_delayed_frames. fix x264cli on streams whose total length is less than the encoder latency. r1209 Add no-mbtree to fprofile (and fix pyramid in fprofile) r1208 Don't print a warning about direct=auto in 2pass when B-frames are off r1207 fix lowres padding, which failed to extrapolate the right side for some resolutions. fix a buffer overread in x264_mbtree_propagate_cost_sse2. no effect on actual behavior, only theoretical correctness. fix x264_slicetype_frame_cost_recalculate on I-frames, which previously used all 0 mb costs. shut up a valgrind warning in predict_8x8_filter_mmx. r1206 simd part of x264_macroblock_tree_propagate. 1.6x faster on conroe. r1205 MB-tree fixes: AQ was applied inconsistently, with some AQed costs compared to other non-AQed costs. Strangely enough, fixing this increases SSIM on some sources but decreases it on others. More investigation needed. Account for weighted bipred. Reduce memory, increase precision, simplify, and early terminate. r1204 Add missing free()s for new data allocated for MB-tree Eliminates a memory leak. r1203 Fix keyframe insertion with MB-tree and no B-frames r1202 Fix MP4 output (bug in malloc checking patch) r1201 Gracefully terminate in the case of a malloc failure Fuzz tests show that all mallocs appear to be checked correctly now. r1200 Fix a potential infinite loop in QPfile parsing on Windows ftell doesn't seem to work properly on Windows in text mode. r1199 Fix delay calculation with multiple threads Delay frames for threading don't actually count as part of lookahead. r1198 Add "veryslow" preset Apparently some people are actually *using* placebo, so I've added this preset to bridge the gap. r1197 Macroblock-tree ratecontrol On by default; can be turned off with --no-mbtree. Uses a large lookahead to track temporal propagation of data and weight quality accordingly. Requires a very large separate statsfile (2 bytes per macroblock) in multi-pass mode. Doesn't work with b-pyramid yet. Note that MB-tree inherently measures quality different from the standard qcomp method, so bitrates produced by CRF may change somewhat. This makes the "medium" preset a bit slower.Accordingly, make "fast" slower as well, and introduce a new preset "faster" between "fast" and "veryfast". All presets "fast" and above will have MB-tree on. Add a new option, --rc-lookahead, to control the distance MB tree looks ahead to perform propagation analysis. Default is 40; larger values will be slower and require more memory but give more accurate results. This value will be used in the future to control ratecontrol lookahead (VBV). Add a new option, --no-psy, to disable all psy optimizations that don't improve PSNR or SSIM. This disables psy-RD/trellis, but also other more subtle internal psy optimizations that can't be controlled directly via external parameters. Quality improvement from MB-tree is about 2-70% depending on content. Strength of MB-tree adjustments can be tweaked using qcompress; higher values mean lower MB-tree strength. Note that MB-tree may perform slightly suboptimally on fades; this will be fixed by weighted prediction, which is coming soon. r1196 Various 1-pass VBV tweaks Make predictors have an offset in addition to a multiplier. This primarily fixes issues in sources with lots of extremely static scenes, such as anime and CGI. We tried linear regressions, but they were very unreliable as predictors. Also allow VBV to be slightly more aggressive in raising QPs to avoid not having enough bits left in some situations. Up to 1db improvement on some clips. r1195 Fix another 10L in QPRD An entry in subpel_iterations was missing. I have no idea how QPRD was working at all without this change. r1194 Update help and cleanup in ratecontrol.c Deal with some out-of-date information. r1193 15% faster refine_bidir_satd, 10% faster refine_bidir_rd (or less with trellis=2) re-roll a loop (saves 44KB code size, which is the cause of most of this speed gain) don't re-mc mvs that haven't changed r1192 Faster bidir_rd plus some bugfixes Cache chroma MC during refine_bidir_rd and use both the luma and chroma caches to skip MC in macroblock_encode. Fix incorrect call to rd_cost_part; refine_bidir_rd output was incorrect for i8>0. Remove some redundant clips. ~12% faster refine_bidir_rd. r1191 Add "fastdecode" tune option It does what it says it does. r1190 Fix two bugs in QPRD fprofile settings now actually fprofile QPRD. Don't use i_mbrd before initializing it. r1189 Fix 10l in QPRD Trellis used wrong lambda with trellis=1 r1188 Fix a nondeterminism with threads and subme>7 Also add a few more checks to eliminate the need for spel_border. r1187 Add QPRD support as subme=10 Refactor trellis lambda selection to be done in analyse_init instead of in trellis. This will allow for more easy adaption of lambda later on; for now it allows constant lambda across variable QPs. QPRD is only available with adaptive quantization enabled and generally improves SSIM and visual quality. Additionally, weight the SSD values from RD based on the relative QP offset for chroma; helps visually at high QPs where chroma has a lower QP than luma. This fixes some visual artifacts created by QPRD at high QPs. Note that this generally hurts PSNR and SSIM, and so is only on when psy-RD is on. r1186 SSSE3 cachesplit workaround for avg2_w16 Palignr-based solution for the most commonly used qpel function. 1-1.5% faster overall on Core 2 chips. r1185 shut up valgrind warnings in trellis r1184 New AQ algorithm option "Auto-variance" uses log(var)^2 instead of log(var) and attempts to adapt strength per-frame. Generates significantly better SSIM; on by default with --tune ssim. Whether it generates visually better quality is still up for debate. Available as --aq-mode 2. r1183 Cacheline-split SSSE3 chroma MC ~70% faster chroma MC on 32-bit Conroe Also slightly faster SSSE3 intra_sad_8x8c r1182 Improve documentation of qp/crf options r1181 Merge array_non_zero into zigzag_sub Faster lossless, cleaner code. SSSE3 version of zigzag_sub_4x4_field, faster lossless interlaced coding. r1180 Fix bug in reference frame autoadjustment For some types of input file, x264 did the adjustment before width/height were known. r1179 Fix fprofile settings to match changes in defaults Also add b-adapt 2 to fprofile. r1178 Slightly faster dequant_flat assembly Eliminate some redundant shifts. r1177 Totally new preset system for x264.c (not libx264), new defaults Other new features include "tune" and "profile" settings; see --help for more details. Unlike most other settings, "preset" and "tune" act before all other options. However, "profile" acts afterwards, overriding all other options. Our defaults have also changed: new defaults are --subme 7 --bframes 3 --8x8dct --no-psnr --no-ssim --threads auto --ref 3 --mixed-refs --trellis 1 --weightb --crf 23 --progress. Users will hopefully find these changes to greatly improve usability. r1176 Update Gabriel's email address in AUTHORS r1175 Early termination for chroma encoding Faster chroma encoding by terminating early if heuristics indicate that the block will be DC-only. This works because the vast majority of inter chroma blocks have no coefficients at all, and those that do are almost always DC-only. Add two new helper DSP functions for this: dct_dc_8x8 and var2_8x8.mmx/sse2/ssse3 versions of each. Early termination is disabled at very low QPs due to it not being useful there. Performance increase is ~1-2% without trellis, up to 5-6% with trellis=2. Increase is greater with lower bitrates. r1174 Fix bug in checkasm frame_init_lowres_core check didn't check the C plane. However, all x86 and PPC assembly was correct regardless of the unit test being incorrect. r1173 Add subpartition cost for sub-8x8 blocks Improves sub-p8x8 mode decision. r1172 Yet more CABAC and CAVLC optimizations Also clean up a lot of pointless code duplication in CAVLC MV coding. r1171 Various CABAC optimizations and cleanups Faster CABAC CBF context calculation for inter blocks. Add x264_constant_p(), will probably be useful in the future as well. Simpler subpartition functions. Clean up and optimize mvd_cpn a bit more. Various other minor optimizations. r1170 AltiVec version of frame_init_lowres_core. 22.4x faster than C on PPC7450 and 25x on PPC970MP. r1169 MMX CABAC mvd sum calculation Faster CABAC mvd coding. r1168 Faster MV prediction Smaller code size, plus I get to use goto. r1167 Fix potential crash in checkasm ssim_end4_sse2 requires aligned sums r1166 SSSE3, faster SSE2/MMX integral_init4v The real reason I wrote this was an excuse to use shufpd. r1165 configure check for uclinux r1164 fix a crash on frame width <= 48 pixels r1163 configure check for cc, rather than reporting lack of compiler as an asm error. configure check for -mno-cygwin, since it's removed from gcc4. r1162 a better way to keep track of mv candidates. 2-4% faster dia, hex, and umh. r1161 reorder some motion estimation patterns. this change is useless on its own, but segregates the bitstream-changing part out of my next optimization. r1160 Fix VBV warning broken in r915 x264 will now correctly warn about maxrate specified without bufsize even when a level is not set. r1159 configure check for ssse3-capable binutils r1158 Fix 10L in r1155 Broke --me esa/tesa due to forgetting to add handling for x264_cost_mv_fpel. r1157 Fix bug where satd was incorrectly used with subme<=1 Faster subme<=1 with i4x4 enabled. r1156 Remove some pointless error handling code in cabac/cavlc r1155 Save some memory on mv cost arrays Have quantizers that use the same lambda share the same cost array. r1154 Various CABAC and CAVLC optimizations Backport CAVLC partial-inlining early termination to CABAC (~2-4% faster CABAC residual coding) r1153 fix a race condition at the end of thread_input r1152 Various trellis speed optimizations r1151 Make i686 the default arch on x86_32 Disabling asm will default to a generic arch. Also fix configure for gcc 4.4. r1150 Faster signed golomb coding 3% faster CAVLC RDO and bitstream writing. r1149 Faster spatial direct MV prediction unroll/tweak col_zero_flag r1148 More CABAC and CAVLC optimizations Simplified function calling for block_residual_write_(cabac|cavlc) and improved sigmap coding. Tried making 0/1-bit specific versions of CABAC asm, but benefit was minimal under GCC 4.3. Helped a decent bit under 3.4, but you shouldn't be using such old versions anyways. r1147 Various optimizations in frametype lookahead r1146 Some cosmetics/cleanup Move some macros to x86util.asm that should have been there to begin with. Fix a typo that didn't cause any issues. r1145 fix "incompatible types in initialization" compilation issues with GCC 4.3 (which is stricter than previous compiler version) r1144 fix conversions between vectors with differing element types or numbers of subparts errors r1143 Add "coded blocks" stat to output information. This measures the total percentage of blocks, intra and inter, which have nonzero coefficients. "y,uvAC,uvDC" refers to luma, chroma DC, and chroma AC blocks. Note that skip blocks are included in this stat. r1142 Enable asm predict_8x8_filter I'm not entirely sure how this snuck its way out of holger's intra pred patch. r1141 Remove various bits of dead code found by CLANG. r1140 Slightly faster SSE4 SA8D, SSE4 Hadamard_AC, SSE2 SSIM shufps is the most underrated SSE instruction on x86. r1139 Various CABAC optimizations Move calculation of b_intra out of the core residual loop and hardcode it where applicable. Inlining cabac_mb_mvd was unnecessary and wasted tremendous amounts of code size.Inlining only cache_mvd is faster and significantly smaller. r1138 CAVLC optimizations faster bs_write_te, port CABAC context selection optimization to CAVLC. r1137 Faster CABAC RDO Since the bypass case is quite unlikely, especially when doing merged sigmap/level coding, it's faster to use a branch than a cmov. r1136 Activate intra_sad_x3_8x8c in lookahead r1135 MBAFF interlaced coding is not allowed in baseline profile r1134 intra_sad_x3_8x8 assembly r1133 intra_sad_x3_4x4 assembly r1132 intra_sad_x3_8x8c assembly Also fix intra_sad_x3_16x16's use of "n" as a loop variable (broke SWAP) r1131 Shave one instruction off CABAC encode_decision range_lps>>6 ranges from 4-7, so (range_lps>>6)-4 == (range_lps>>6) & 3 r1130 Faster probe_skip Add a second chroma threshold after the DC transform. r1129 Add missing "static" qualifier to two arrays Should slightly improve performance. r1128 SSE2 zigzag_interleave Replace PHADD with FastShuffle (more accurate naming). This flag represents asm functions that rely on fast SSE2 shuffle units, and thus are only faster on Phenom, Nehalem, and Penryn CPUs. r1127 Faster integral_init palignr to avoid unaligned loads is worth it in inith, but not initv. r1126 Faster SSSE3 hpel_filter_v ~10% faster hpel_filter on 64-bit Penryn. 32-bit version by Jason Garrett-Glaser. r1125 Faster SSE2 pixel_var Optimized using the DEINTB method from r1122.~32% faster var_16x16 on Conroe. r1124 SSSE3 hpel_filter_v Optimized using the same method as in r1122.Patch partially by Holger. ~8% faster hpel filter on 64-bit Nehalem r1123 Update some asm copyright headers r1122 Vastly faster SATD/SA8D/Hadamard_AC/SSD/DCT/IDCT Heavily optimized for Core 2 and Nehalem, but performance should improve on all modern x86 CPUs. 16x16 SATD: +18% speed on K8(64bit), +22% on K10(32bit), +42% on Penryn(64bit), +44% on Nehalem(64bit), +50% on P4(32bit), +98% on Conroe(64bit) Similar performance boosts in SATD-like functions (SA8D, hadamard_ac) and somewhat less in DCT/IDCT/SSD. Overall performance boost is up to ~15% on 64-bit Conroe. r1121 Update x264 copyright date r1120 Remove pre-scenecut from fprofile commands as well Also add psy-trellis to fprofile r1119 Slightly faster 8x16 SAD on Penryn Core 2 Same as MMX 8x16 cacheline SAD, but calls SSE2 8x16 SAD in non-cacheline case. Only Nehalem benefits from sizes smaller than 8x16, and Nehalem doesn't use cacheline functions, so no smaller versions are included. r1118 Fix scenecut and VBV with videos of width/height <= 32 Also remove an unused variable r1117 Remove non-pre scenecut Add support for no-b-adapt + pre-scenecut (patch by BugMaster) Pre-scenecut was generally better than regular scenecut in terms of accuracy and regular scenecut didn't work in threaded mode anyways. Add no-scenecut option (scenecut=0 is now no scenecut; previously it was -1) Fix an incorrect bias towards P-frames near scenecuts with B-adapt 2. Simplify pre-scenecut code. r1116 Add AltiVec version of hadamard_ac. 2.4x faster than the C version. Note this this implementation is pretty naive and should be improved by implementing what's discussed in this ML thread: date: Mon, Feb 2, 2009 at 6:58 PM subject: Re: [x264-devel] [PATCH] AltiVec implementation of hadamard_ac routines r1115 Fix regression in r1085 Deblocking was very slightly incorrect with partitions=all. Bug found by BugMaster. r1114 Optimize neighbor CBP calculation and fix related regression r1105 introduced array overflow in cbp handling r1113 Show FPS when importing a raw YUV file r1112 Windows 64-bit support A "make distclean" is probably required after updating to this revision. r1111 Minor fixes and cosmetics Suppress a GCC warning, fix a non-problematic array overflow, one REP->REP_RET. r1110 fix 10l in 75b495f2723fcb77f Original thread: date: Mon, Feb 9, 2009 at 9:37 PM commit: Spare a vec_perm and a vec_mergeh though using a LUT of permutation vectors . (Guillaume Poirier ) :Mon Feb 9 21:17:33 2009 +0100 Spare a vec_perm and a vec_mergeh though using a LUT of permutation vectors. r1108 Promote chroma planes to 16 byte alignment. This will allow simplifying vectors loads that can only load 16-bytes aligned data (such as AltiVec). r1107 Fix 10L in intra pred Forgetting a %define resulted in SIGILL on 32-bit systems without SSE (e.g. Athlon XP). r1106 Add decimation in i16x16 blocks Up to +0.04db with CAVLC, generally a lot less with CABAC. r1105 Much faster CABAC residual context selection Up to ~17% faster CABAC RDO, ~36% faster intra-only CABAC RDO. Up to 7% faster overall in extreme cases. r1104 Faster coeff_last64 on 32-bit r1103 More intra pred asm optimizations SSSE3 version of predict_8x8_hu SSE2 version of predict_8x8c_p SSSE3 versions of both planar prediction functions Optimizations to predict_16x16_p_sse2 Some unnecessary REP_RETs -> RETs. SSE2 version of predict_8x8_vr by Holger. SSE2 version of predict_8x8_hd. Don't compile MMX versions of some of the pred functions on x86_64. Remove now-useless x86_64 C versions of 4x4 pred functions. Rewrite some of the x86_64-only C functions in asm. r1102 Speed-up mc_chroma_altivec by using vec_mladd cleverly, and unrolling. Also put width == 2 variant in its own scalar function because it's faster than a vectorized one. r1101 Merging Holger's GSOC branch part 2: intra prediction Assembly versions of most remaining 4x4 and 8x8 intra pred functions. Assembly version of predict_8x8_filter. A few other optimizations. Primarily Core 2-optimized. r1100 10l: fix compilation with GCC 4.3+ r1099 Faster 8x8dct+CAVLC interleave Integrate array_non_zero with the CAVLC 8x8dct interleave function. Roughly 1.5-2x faster than the original separate array_non_zero method. r1098 Measure CBP cost in i8x8 RD refinement ~0.02-0.05db PSNR gain at high quants in intra-only encoding, pretty small otherwise. Allows a small optimization in i8x8 encoding. r1097 Take advantage of saturated signed horizontal sum instructions in the variance computation epilogue since there won't be any overflow triggering an overflow. Suggested by Loren Merritt r1096 Massive overhaul of nnz/cbp calculation Modify quantization to also calculate array_non_zero. PPC assembly changes by gpoirior. New quant asm includes some small tweaks to quant and SSE4 versions using ptest for the array_non_zero. Use this new feature of quant to merge nnz/cbp calculation directly with encoding and avoid many unnecessary calls to dequant/zigzag/decimate/etc. Also add new i16x16 DC-only iDCT with asm. Since intra encoding now directly calculates nnz, skip_intra now backs up nnz/cbp as well. Output should be equivalent except when using p4x4+RDO because of a subtlety involving old nnz values lying around. Performance increase in macroblock_encode: ~18% with dct-decimate, 30% without at CRF 25. Overall performance increase 0-6% depending on encoding settings. r1095 Add PowerPC support for "checkasm --bench", reading the time base register. This isn't ideal since the `time base' register is running at a fraction of the processor cycle speed, so the measurement isn't as precise as x86's rdtsc. It's better than nothing though... r1094 fix detection of pthread and isfinite on OpenBSD r1093 remove $ECHON kludge, which broke on SunOS. bring back `gcc -MT`. remove auto-reconfigure on svn update, which has done nothing since we stopped using svn. fix $AS on sparc (was disabled by mmx check). fix --extra-asflags (was ignored). mark bash scripts as bash, not sh patch partly by Greg Robinson and Jugdish. r1092 1.6x faster satd_c (and sa8d and hadamard_ac) with pseudo-simd. 60KB smaller binary. r1091 Hack around a potential failure point in VBV pred_b_from_p can become absurdly large in static scenes, leading to rare collapses of quality with VBV+B-frames+threads. This isn't a final fix, but should resolve the problem in most cases in the meantime. r1090 Much faster chroma encoding and other opts ~15% faster chroma encode by reorganizing CBP calculation and adding special-case idct_dc function, since most coded chroma blocks are DC-only. Small optimization in cache_save (skip_bp) Fix array_non_zero to not violate strict aliasing (should eliminate miscompilation issues in the future) Add in automatic substitutions for some asm instructions that have an equivalent smaller representation. r1089 add AltiVec implementation of x264_mc_copy_w16_aligned r1088 add AltiVec implementation of x264_pixel_var_16x16 and x264_pixel_var_8x8 r1087 add AltiVec 16 <-> 32 bits conversions macros r1086 Replace 16x16=>32 mul + pack + add by a simple 16x16=>16 multiply-add. Suggested by Loren. r1085 Eliminate support for direct_8x8_inference=0 The benefit in the most extreme contrived situation was at most 0.001db PSNR, at the cost of slower decoding. As this option was basically useless, it was a waste of code and prevented some other useful optimizations. Remove some unused mc code related to sub-8x8 partitions. Small deblocking speedup when p4x4 is used. Also remove unused x264_nal_decode prototype from x264.h. r1084 Add AltiVec and CPU numbers detection on OpenBSD. r1083 Add AltiVec implementation of predict_8x8c_p. 2.6x faster than scalar C. r1082 Warn if direct auto wasn't set on the first pass And, if it wasn't, run direct auto as if it was the first pass, rather than simply forcing temporal direct mode on all frames. Also a small tweak to coeff_level_run asm. r1081 Changes the PowerPC ppccommon.h header so it no longer checks for a particular OS such as Linux but instead looks for HAVE_ALTIVEC_H being set. Fixes all *BSD/PowerPC builds. r1080 update x264_hpel_filter_altivec's prototype to match the one of the C version. in commit 045ae4045a1827555b3eaab4fbf3c9809e98c58f (factorization of mallocs) or: Guillaume Poirier <gpoirier@mplayerhq.hu> Date:Wed Jan 14 21:49:42 2009 +0100 rename vector+array unions to closer match the vector typedefs names. r1078 Add Altivec implementation of all the remaining 16x16 predict routines. r1077 Cache ref costs and use more accurate MV costs New MV costs should improve quality slightly by improving the smoothness of the field of MV costs (and they're closer to CABAC's actual costs). Despite being optimized for CABAC, they still help under CAVLC, albeit less. MV cost change by Loren Merritt r1076 Support forced frametypes with scenecut/b-adapt This allows an input qpfile to be used to force I-frames, for example. The same can be done through the library interface. Document the format of the qpfile in --longhelp and the forcing of frametypes in x264.h Note that forcing B-frames and B-refs may not always have the intended result. Patch partially by Steven Walters <kemuri9@gmail.com>. r1075 Remove an IDIV from i8x8 analysis Only one IDIV is left in macroblock level code (transform_rd) r1074 Fix regression in r1066 With some combinations of video width and other settings, the scratch buffer was slightly too small. This caused heap corruption on some systems. Also prevent merange from being raised during encoding with esa/tesa through encoder_reconfig, as this no longer works. r1073 Disable B-frames in lossless mode They hurt compression anyways, and direct auto was bugged with lossless. r1072 Factorize in ppccommon.h the conditional inclusion of altivec.h on Linux systems. r1071 Disable __builtin_clz() intrinsic on gcc versions prior to 3.4. The function did not exist before that version. r1070 Small tweaks to coeff asm Factor out a few redundant pxors Related cosmetics r1069 Use the correct strtok under MSVC Also change one malloc -> x264_malloc r1068 Add stack alignment for lookahead functions Should allow libx264 to be called from non-gcc-compiled applications without adding force_align_arg_pointer. r1067 Add support for SSE4a (Phenom) LZCNT instruction Significantly speeds up coeff_last and coeff_level_run on Phenom CPUs for faster CAVLC and CABAC. Also a small tweak to coeff_level_run asm. r1066 factor mallocs out of hpel, ssim, and esa. there should now be no memory allocation outside of init-time. r1065 Much faster CAVLC RDO and bitstream writing Pure asm version of level/run coding.Over 2x faster than C. Up to 40% faster CAVLC RDO.Overall benefit up to ~7.5% with RDO or ~5% with fast encoding settings. r1064 Cosmetics: cleaner syntax for defining temporary registers in asm Globally define t#[qdwb], so that only t# needs to be locally defined when reorganizing registers r1063 Much faster CABAC RDO Since RDO doesn't care about what order bit costs are calculated, merge sigmap and level coding into the same loop in RDO. This is bit-exact for 4x4dct but slightly incorrect for 8x8dct due to the sigmap containing duplicated contexts. However, the PSNR penalty of this is extremely small (~0.001db). Speed benefit is about 15% in 4x4dct and 30% in 8x8dct residual bit cost calculation at QP20. Overall encoding speed benefit is up to 5%, depending on encoding settings. Also remove an old unnecessary CABAC table that hasn't been used for years. r1062 VLC table optimizations Slightly reorganize VLC tables for ~2% faster block_residual_write_cavlc. Also a small optimization in p8x8 CAVLC. r1061 Fix crash in --me esa/tesa introduced in r1058 Also suppress the last mingw warning message r1060 Optimize variance asm + minor changes Remove SAD argument from var, not needed anymore. Speed up var asm a bit by eliminating psadbw and instead HADDWing at end. Eliminate all remaining warnings on gcc 3.4 on cygwin Port another minor optimization from lavc (pskip) r1059 Minor CABAC cleanups and related optimizations Merge the two list tables to allow cleaner MC/CABAC/CAVLC code Remove lots of unnecessary {s Port some very minor opts from lavc r1058 faster ESA init reduce memory if using ESA and not p4x4 r1057 More macroblock_cache optimizations Patch partially by Loren Merritt r1056 Faster macroblock_cache_rect Explicit loop unrolling r1055 Optimizations in predict_mv_direct Add some early terminations and minor optimizations This change may also fix the extremely rare direct+threading MV bug. r1054 Fix visual corruption when picture width was not mod 32. The previous Altivec implemention of mc_chroma assumed that i_src_stride was always mod 16. r1053 Add support for FSF GCC version >= 4.3 on OSX. So far, only Apple GCC version was supported. r1052 More accurate refcost for p8x8 CAVLC Slightly better quality, especially in non-RD mode, with CAVLC. r1051 use lookup tables instead of actual exp/pow for AQ Significant speed boost, especially on CPUs with atrociously slow floating point units (e.g. Pentium 4 saves 800 clocks per MB with this change). Add x264_clz function as part of the LUT system: this may be useful later. Note this changes output somewhat as the numbers from the lookup table are not exact. r1050 Suppress saveptr warnings on Windows GCC r1049 More small speed tweaks to macroblock.c r1048 Much faster CAVLC residual coding Use a VLC table for common levelcodes instead of constructing them on-the-spot Branchless version of i_trailing calculation (2x faster on Nehalem) Completely remove array_non_zero_count and instead use the count calculated in level/run coding.Note: this slightly changes output with subme > 7 due to different nonzero counts being stored during qpel RD. r1047 fix compilation with GCC-4.3+ r1046 High Profile allows 25% higher maxbitrate/cpb Correct level detection to take this into account. r1045 s/nasm/yasm in VS project file r1044 Cosmetic: update various file headers. r1043 add date and compiler to `x264 --version` r1042 10L in r1041 r1041 Significantly faster CABAC and CAVLC residual coding and bit cost calculation Early-terminate in residual writing using stored nnz counts To allow the above, store nnz counts for luma and chroma DC Add assembly functions to find the last nonzero coefficient in a block Overall ~1.9% faster at subme9+8x8dct+qp25 with CAVLC, ~0.7% faster with CABAC Note this changes output slightly with CABAC RDO because it requires always storing correct nnz values during RDO, which wasn't done before in cases it wasn't useful. CAVLC output should be equivalent. r1040 dequant_4x4_dc assembly About 3.5x faster DC dequant on Conroe r1039 fix an overflow in dct4x4dc_mmx (unlikely to have occurred in any real video) r1038 Remove nasm support Nasm won't correctly parse the SSE4 code introduced a few revisions ago, so we're removing support. Users should upgrade to yasm 0.6.1 or later. r1037 Fix rare warning messages in ratecontrol due to r1020 r1036 Fix MSVC compilation and clean up MSVC build file Remove Release64 which never worked anyways. r1035 Faster width4 SSD+SATD, SSE4 optimizations Do satd 4x8 by transposing the two blocks' positions and running satd 8x4. Use pinsrd (SSE4) for faster width4 SSD Globally replace movlhps with punpcklqdq (it seems to be faster on Conroe) Move mask_misalign declaration to cpu.h to avoid warning in encoder.c. These optimizations help on Nehalem, Phenom, and Penryn CPUs. r1034 fix indentation, whitespace cleanup, more consistent indentation of macro backslashes r1033 Change some macros to be more sensitive to memory alignment, thus avoiding useless loads/stores and calculations of permutation vectors. Affected functions are all of mc_luma, mc_chroma, 'get_ref', SATD, SA8D and deblock. Gains globally vary from ~5% - 15% on a depending on settings running on a 1.42 ghz G4. r1032 refactor satd. 20KB smaller binary. refactor sa8d. slightly faster. more checkasm for hadamard. r1031 Fix crash with threads and SSEMisalign on Phenom Misalign mask needed to be set separately for each encoding thread. r1030 Phenom CPU optimizations Faster hpel_filter by using unaligned loads instead of emulated PALIGNR Faster hpel_filter on 64-bit by using the 32-bit version (the cost of emulated PALIGNR is high enough that the savings from caching intermediate values is not worth it). Add support for misaligned_mask on Phenom: ~2% faster hpel_filter, ~4% faster width16 multisad, 7% faster width20 get_ref. Replace width12 mmx with width16 sse on Phenom and Nehalem: 32% faster width12 get_ref on Phenom. Merge cpu-32.asm and cpu-64.asm Thanks to Easy123 for contributing a Phenom box for a weekend so I could write these optimizations. r1029 A few tweaks to decimate asm A little bit faster on both 32-bit and 64-bit r1028 Nehalem optimization part 2: SSE2 width-8 SAD Helps a bit on Phenom as well ~25% faster width8 multiSAD on Nehalem r1027 Add subme=0 (fullpel motion estimation only) Only for experimental purposes and ultra-fast encoding.Probably not a good idea for firstpass. r1026 Fix minor memory leak in r1022 r1025 r1024 borked checkasm Remove idct/dct2x2 from checkasm as they are no longer in dctf r1024 Faster chroma encoding 9-12% faster chroma encode. Move all functions for handling chroma DC that don't have assembly versions to macroblock.c and inline them, along with a few other tweaks. r1023 Various cosmetics and minor fixes Disable hadamard_ac sse2/ssse3 under stack_mod4 Fix one MSVC compilation warning Fix compilation in debug mode in certain cases on x64 Remove eval.c from MSVC project Fix crash when VBV is used in CQP mode Patches by MasterNobody r1022 Faster b-adapt + adaptive quantization Factor out pow to be only called once per macroblock.Speeds up b-adapt, especially b-adapt 2, considerably. Speed boost is as high as 24% with b-adapt 2 + b-frames 16. r1021 Faster CABAC residual encoding 6% faster block_residual_write_cabac in RD mode. r1020 Fix potential crash in the case that the input statsfile is too short Also resolve various other potential weirdness (such as multiple copies of the same error message in threaded mode). r1019 Initial Nehalem CPU optimizations movaps/movups are no longer equivalent to their integer equivalents on the Nehalem, so that substitution is removed. Nehalem has a much lower cacheline split penalty than previous Intel CPUs, so cacheline workarounds are no longer necessary. Intel for providing Avail Media with the pre-release Nehalem CPU needed to prepare these (and other not-yet-committed) optimizations. or: Gabriel Bouvigne <bouvigne@mp3-tech.org> Date:Tue Nov 4 09:56:03 2008 -0800 Fix potential infinite loop in VBV under GCC 4.2 r1017 Encoder_reconfig: esa/tesa can only be enabled if they were on to begin with Bug report by kemuri-_9. r1016 Fix bug in hadamard_ac SSE assembly Some extreme inputs could cause overflows. r1015 Full sub8x8 RD mode decision Small speed penalty with p4x4 enabled, but significant quality gain at subme >= 6 As before, gain is proportional to the amount of p4x4 actually useful in a given input at the given bitrate. r1014 Optimize CABAC bit cost calculation Speed up cabac mvd and add new precalculated transition/entropy table. Add "noup" function for cabac operations to not update the state table when it isn't necessary. 1-3% faster macroblock_size_cabac. Cosmetics r1013 Replace "git-command" with "git command" in version.sh for git 1.6 support r1012 Add assembly version of CAVLC 8x8dct interleave Faster CAVLC encoding and RDO with 8x8dct r1011 Add support for psy-rd/trellis to encoder_reconfig r1010 Fix Darwin speed regression r1009 Further improve prediction of bitrate and VBV in threaded mode r1008 Sub-8x8 Qpel-RD in P-frames Improves quality when using p8x4/p4x8/p4x4 subpartitions Benefit is proportional to how many sub-8x8 partitions are used; helps most at high bitrates and low resolutions. r1007 Faster qpel-RD 3-4% faster qpel-RD; avoid re-checking bmv/pmv during the hex search. r1006 Some minor optimizations in RD refinement Don't write b subpartition in CABAC RDO Calculate nonzero count in i4x4 CAVLC RDO r1005 Faster deblocking when p4x4 isn't used Most of the MV checks can be skipped, resulting in faster strength calculation r1004 Print profile and level information upon starting encode Previously level was only printed as part of autodetect, and only in verbose mode. r1003 Fix possible crash in trellis at very low QPs r1002 Add assembly versions of decimate_score 3-7x faster decimation, 1-3% faster overall r1001 Fix typo in subme8/9 lossless qpel-RD Slightly improves compression. r1000 Extend trellis to support luma/chroma DC and chroma AC Small speed loss in trellis 1, slightly larger in trellis 2, but significant quality improvement. r999 rm gtk, avc2avi. I don't remember why I allowed a gui into the repository in the first place. There's nothing that makes this one special relative to all the other x264 guis. avc2avi doesn't compile since we removed the bitstream reader. And avc doesn't belong in avi. r998 Resolve quality regression in r996 Accidentally removed the wrong line of code.I think this classifies as a "10l". Thanks to techouse for initial bug report and skystrife for helping me find it. r997 Fix minor memory leak accidentally added with the addition of b-adapt 2 r996 Rework subme system, add RD refinement in B-frames The new system is as follows: subme6 is RD in I/P frames, subme7 is RD in all frames, subme8 is RD refinement in I/P frames, and subme9 is RD refinement in all frames. subme6 == old subme6, subme7 == old subme6+brdo, subme8 == old subme7+brdo, subme9 == no equivalent --b-rdo has, accordingly, been removed.--bime has also been removed, and instead enabled automatically at subme >= 5. RD refinement in B-frames (subme9) includes both qpel-RD and an RD version of bime. r995 Fix potential miscompilation of some inline asm Caused problems under some gcc 4.x versions with predictive lossless r994 Replace High 4:4:4 profile lossless with High 4:4:4 Predictive. This improves lossless compression by about 4-25% depending on source. The benefit is generally higher for intra-only compression. Also add support for 8x8dct and i8x8 blocks in lossless mode; this improves compression very slightly. In some rare cases 8x8dct can hurt compression in lossless mode, but its usually helpful, albeit marginally. Note that 8x8dct is only available with CABAC as it is never useful with CAVLC. High 4:4:4 Predictive replaced the previous profile in a 2007 revision to the H.264 standard. The only known compliant decoder for this profile is the latest version of CoreAVC. As I write this, JM does not actually correctly decode this profile. lack of support will soon change with this commit, as x264 will be (to my knowledge) the first compliant encoder. :Fri Sep 26 09:19:56 2008 -0700 Fix typo in progress indicator when using piped input r992 avg_weight_ssse3 r991 fix bitstream writer on bigendian 64bit (regression in r903) r990 remove authors whose code no longer exists r989 more diagnostics when configure finds an unsuitable assembler r988 Make x264 progress indicator more concise Now the % indicator should be readable on the header of a minimized window on Windows systems. r987 Fix deblocking + threads + AQ bug At low QPs, with threads and deblocking on, deblocking could be improperly disabled. Revision in which this bug was introduced is unknown; it may be as old as b_variable_qp in x264 itself. r986 Resolve possible crash in bime, improve the fix in r985 r985 Fix rare crash issue in b-adapt Regression *probably* in r979 r984 Merging Holger's GSOC branch part 1: hpel_filter speedups r983 r980 borked weighted bime r982 Disable I_PCM with psy-RD psy-RD seems to put the PCM threshold a bit lower than it should be, so PCM is now disabled under psy-RD. r981 Merge avg and avg_weight avg_weight no longer has to be special-cased in the code; faster weightb r980 Rewrite avg/avg_weight to take two source pointers This allows the use of get_ref instead of mc_luma almost everywhere for bipred r979 Use low-resolution lookahead motion vectors as an extra predictor Improves quality considerably (0-5%) in 1pass/CRF mode, especially with lower --me values and complex motion. Reverses the order of lowres lookahead search to improve the usefulness of the extra predictors. r978 Add missing free() for f_qp_offset in frame.c r977 Correct misprediction of bitrate in threaded mode Improves bitrate accuracy in cases with large numbers of threads. Loosely based on a patch by BugMaster. r976 Fix a case in which VBV underflows can occur Fix a potential case where a frame might be initially allocated too low a QP, which would then have to be raised a low during row-based ratecontrol. In some cases, this could even produce VBV underflows in 2pass mode. r975 Use correct format specifier for uint64_t r974 Cache motion vectors in lowres lookahead This vastly speeds up b-adapt 2, especially at large bframes values. This changes output because now MV prediction in lookahead only uses L0/L1 MVs, not bidir.This isn't a problem, since the bidir prediction wasn't really correct to begin with, so the change in output is neither positive nor negative. This also allowed the removal of some unnecessary memsets, which should also give a small speed boost. Finally, this allows the use of the lowres motion vectors for predictors in some future patch. r973 Fix regression in b-adapt patch: encoder_open failed for multipass encodes without bframes. r972 Stop SAR in y4m input from overriding --sar on commandline r971 hadamard_ac for psy-rd c version is 1.7x faster than satd+sa8d+sad ssse3 version is 2.3x faster than satd+sa8d+sad r970 Psychovisually optimized rate-distortion optimization and trellis The latter, psy-trellis, is disabled by default and is reserved as experimental; your mileage may vary. Default subme is raised to 6 so that psy RD is on by default. r969 Add optional more optimal B-frame decision method This method (--b-adapt 2) uses a Viterbi algorithm somewhat similar to that used in trellis quantization. Note that it is not fully optimized and is very slow with large --bframes values. It also takes into account weightb, which should improve fade detection. Additionally, changes were made to cache lowres intra results for each frame to avoid recalculating them.This should improve performance in both B-frame decision methods. This can also be done for motion vectors, which will dramatically improve b-adapt 2 performance when it is complete. This patch also reads b_adapt and scenecut settings from the first pass so that the x264 header information in the output file will have correct information (since frametype decision is only done on the first pass). r968 Move adaptive quantization to before ratecontrol, eliminate qcomp bias This change improves VBV accuracy and improves bit distribution in CRF and 2pass. Instead of being applied after ratecontrol, AQ becomes part of the complexity measure that ratecontrol uses. This allows for modularity for changes to AQ; a new AQ algorithm can be introduced simply by introducing a new aq_mode and a corresponding if in adaptive_quant_frame. This also allows quantizer field smoothing, since quantizers are calculated beofrehand rather during encoding. Since there is no more reason for it, aq_mode 1 is removed.The new mode 1 is in a sense a merger of the old modes 1 and 2. WARNING: This change redefines CRF when using AQ, so output bitrate for a given CRF may be significantly different from before this change! r967 Fix crash when using b-adapt at resolutions 32x32 or below. Original patch by BugMaster, but was mostly rewritten in order to make b-adapt actually *work* at such resolutions, not merely stop crashing. r966 Add title-bar progress indicator under WIN32 Also add bitrate-so-far output when piping data to x264 (total frames not known) Patch mostly by recover from Doom9. r965 Revert part of r963 In some rare (but significant) cases, the optimized nal_encode algorithm gave incorrect results. r964 Predict 4x4_DC asm Also remove 5-year-old unnecessary #define that reduced speed unnecessarily under MSVC-compiled builds r963 Faster NAL unit encoding and remove unused nal_decode Small speedup at very high bitrates r962 CAVLC cleanup and optimizations Also move some small functions in macroblock.c to a .h file so they can be inlined. r961 Faster avg_weight assembly Unrolling the loop a bit improves performance r960 Faster H asm intra prediction functions Take advantage of the H prediction method invented for merged intra SAD and apply it to regular prediction, too. r959 Add merged SAD for i16x16 analysis Roughly 30% faster i16x16 analysis under subme=1 r958 Add sad_aligned for faster subme=1 mbcmp Distinguish between unaligned and aligned uses of mbcmp SAD_aligned, for MMX SADs, uses non-cacheline SADs. r957 Improve progress indicator Show average bitrate so far during encoding Decrease update interval for longer encodes (max of 10 frames encoded between updates) r956 Fix speed regression in r951 Row SATDs are only necessary in VBV mode, so don't need to be checked if VBV is off. r955 zigzag asm r954 fix SOFLAGS used when building gtk frontend patch by Markus Kanet %darkvision A gmx P eu% r953 remove the distinction between itex and ptex (changes 2pass statsfile format) r952 hardcode the ratecontrol equation, and remove the rceq option r951 Fix some uses of uninitialized row_satd values in VBV Resolves some issues with QP51 in I-frames with scenecut r950 Activate trellis in p8x8 qpel RD Also clean up macroblock.c with some refactoring Note that this change significantly reduces subme7+trellis2 performance, but improves quality. Issue originally reported by Alex_W. r949 Improve VBV accuracy Don't use the previous frame's row SATD as a predictor if it is too different from this frame's row SATD. r948 improve generation of Darwin libraries Patch by vmrsss %vmrsss A gmail P com% r947 Fix compilation in gcc 3.4.x (issue in r946) Due to a bug in gcc 3.4.x, in certain cases of inlining, the array_non_zero_int_mmx inline asssembly is miscompiled and causes a crash with --subme 7 --8x8dct. This minor hack fixes this issue. r946 shut up various gcc warnings r945 fix a crash with invalid args and --thread-input (introduced in r921) r944 drop support for x86_32 PIC. r943 use permute macros in satd move some more shared macros to x264util.asm r942 cosmetics r941 r940 broke threads r940 Cleanups in macroblock_cache_save/load A bit more loop unrolling, and moving some constant code to the global init function r939 Deblocking code cleanup and cosmetics Convert the style of the deblocking code to the standard x264 style Eliminate some trailing whitespace r938 4% faster deblock: special-case macroblock edges Along with a bit of related code reorganization and macroification r937 Add dedicated variance function instead of using SAD+SSD Faster variance calculation r936 6% faster deblock: remove some clips, earlier termiantion on low qps. r935 Faster deblocking Early termination for bS=0, alpha=0, beta=0 Refactoring, various other optimizations About 30% faster deblocking overall. r934 asm cosmetics r933 yet another posix-emulating define on solaris r932 update msvc projectfile r931 drop support for msvc6 r930 Prevent VBV from lowering quantizer too much This code seemed to act up unexpectedly sometimes, creating a situation where in 1-pass VBV mode, a frame's quantizer would drop all the way to qpmin and then shoot back upwards to qpmax, causing serious visual issues. This change may decrease bitrate in VBV mode, but that is preferable to the artifacting produced by this code. r929 Improve subme7 at low QPs and add subme7 support in lossless mode r928 cosmetics: merge x86inc*.asm r927 Add missing x264util.asm r926 Basic sanity checking of qpmax/qpmin options r925 Fix regression in r922 set the chroma DC coefficients to zero for residual coding in qpel-rd fix C99ism r924 Refactor asm macros part 2: DCT r923 Refactor asm macros part 1: DCT r922 Improve intra RD refine, speed up residual_write_cabac a do/while loop can be used for residual_write, but i8x8 had to be fixed so that it wouldn't call residual_write with zero coeffs proper nnz handling added to cabac intra rd refine chroma cbp added to 8x8 chroma rd cbp was tested, but wasn't useful r921 Fix a few more minor memleaks r920 stats summary: print distribution of numbers of consecutive B-frames r919 add interlacing to the list of stuff checked by x264_validate_levels r918 Fix C99-ism in r907 r917 Faster temporal predictor calculation a separate commit because this changes rounding, and thus changes output slightly. :Thu Jul 17 07:55:24 2008 -0600 Align lowres planes for improved cacheline split performance r915 autodetect level based on resolution/bitrate/refs/etc, rather than defaulting to L5.1 if vbv is not enabled (and especially in crf/cqp), we have to guess max bitrate, so we might underestimate the required level. r914 fix bs_write_ue_big for values >= 0x10000. (no immediate effect, since nothing writes such values yet) r913 Fix lossless mode borked in r901 r912 Relax QPfile restrictions Allow a QPfile to contain fewer frames than the total number of frames in the video and have ratecontrol fill in the rest. Patch by kemuri9. r911 Limit MVrange correctly in interlaced mode Bug report by Sigma Designs, Inc. r910 Fix bug with PCM and adaptive quantization In rare cases CABAC desync could occur, causing bitstream corruption r909 Fix memory leak upon x264 closing Doesn't affect the CLI, but potentially important for programs which call x264 as a shared library. r908 Fix compilation on PPC systems (borked in r903) Bigendian systems didn't have endian_fix32 defined r907 Add L1 reflist and B macroblock types to x264 info Also remove display of "PCM" if PCM mode is never used in the encode. L1 reflist information will only show if pyramid coding is used. r906 Fix and enable I_PCM macroblock support In RD mode, always consider PCM as a macroblock mode possibility Fix bitstream writing for PCM blocks in CAVLC and CABAC, and a few other minor changes to make PCM work. PCM macroblocks improve compression at very low QPs (1-5) and in lossless mode. r905 de-duplicate vlc tables r904 faster ue/se/te write r903 faster bs_write r902 cosmetics in ssd asm r901 Various optimizations and cosmetics Update AUTHORS file with Gabriel and me update XCHG macro to work correctly in if statements Add new lookup tables for block_idx and fdec/fenc addresses Slightly faster array_non_zero_count_mmx (patch by holger) Eliminate branch in analyse_intra Unroll loops in and clean up chroma encode Convert some for loops to do/while loops for speed improvement Do explicit write-combining on --me tesa mvsad_t struct Shrink --me esa zero[] array Speed up bime by reducing size of visited[][][] array r900 Resolve floating point exception with frame_init_lowres mmx In some cases, the mmx version of frame_init_lowres could leave the FPU uninitialized for use in ratecontrol, resulting in floating point exceptions. Since frame_init_lowres is such a time-consuming function, an emms was just put at the end, since it costs almost nothing compared to the total time of frame_init_lowres. r899 Update my email address r898 Update file headers throughout x264 Update "Authors" lists based on actual authorship; highest is most important Update copyright notices and remove old CVS tags from file headers Add file headers to GTK and other sections missing them Update FSF address Other header-related cosmetics r897 denoise_dct asm r896 cosmetics in permutation macros SWAP can now take mmregs directly, rather than just their numbers r895 Fix bug in adaptive quantization In some cases adaptive quantization did not correctly calculate the variance. Bug reported by MasterNobody r894 lowres_init asm rounding is changed for asm convenience. this makes the c version slower, but there's no way around that if all the implementations are to have the same results. r893 Optimizations and cosmetics in macroblock.c If an i4x4 dct block has no coefficients, don't bother with dequant/zigzag/idct.Not useful for larger sizes because the odds of an empty block are much lower. Cosmetics in i16x16 to be more consistent with other similar functions. Add an SSD threshold for chroma in probe_skip to improve speed and minimize time spent on chroma skip analysis. Rename lambda arrays to lambda_tab for consistency. r892 some asm functions require aligned stack. disable these when compiling with msvc/icc. r891 Move bitstream end check to macroblock level Additionally, instead of silently truncating the frame upon reaching the end of the buffer, reallocate a larger buffer instead. r890 Convert NNZ to raster order and other optimizations Converting NNZ to raster order simplifies a lot of the load/store code and allows more use of write-combining. More use of write-combining throughout load/save code in common/macroblock.c GCC has aliasing issues in the case of stores to 8-bit heap-allocated arrays; dereferencing the pointer once avoids this problem and significantly increases performance. More manual loop unrolling and such. Move all packXtoY functions to macroblock.h so any function can use them. Add pack8to32. Minor optimizations to encoder/macroblock.c r889 mc_chroma_sse2/ssse3 r888 checkasm --bench=function_name r887 interleave psnr/ssim computation with reference frame filtering, to improve cache coherency r886 Add more inline asm and a runtime check for MMXEXT support x264 will now terminate gracefully rather than SIGILL when run on a machine with no MMXEXT support. A configure option is now available to build x264 without assembly support for support on such old CPUs as the Pentium 2, K6, etc. r885 Use aligned memcpy for x264_me_t struct and cosmetics r884 Cosmetics and loop unrolling GCC is not very good at loop unrolling in cases where it can perform constant propagation, so the unrolling unfortunately has to be done manually. r883 Fix regression in 64-bit in r882 i_mvc needs to be 64-bit when used with a 64-bit memory pointer r882 More tweaks to me.c Added inline MMX version of UMH's predictor difference test Various cosmetics throughout me.c Removed a C99-ism introduced in r878. r881 Fix regression in r736 r736 added intra RD refinement to B-frames; however, it is possible for subme=7 to be used without b-rdo. This means intra RD isn't run, and therefore it is possible for intra chroma analysis to not have been run, since update_cache was never called for an intra block, and chroma ME is not required even at subme=7. r801, which removed a memset, made this worse because previously the chroma prediction mode was at least initialized to zero; now it was not initialized at all. Therefore, --no-chroma-me, --subme 7, and no --b-rdo had the potential to crash. This change restricts intra RD refinement to only be run when --b-rdo is enabled (sensible to begin with), thus preventing a crash in this case. r880 Fix regression in r850 Bug resulted in rare incorrect chroma encoding r879 Cosmetics in VBV handling r878 Tweaks and cosmetics in me.c Use write-combining for predictor checking and other tweaks. r877 Partially inline trellis quantization Inlining trellis into the 4x4/8x8 trellis wrappers increases trellis speed by about 5-10% through constant propagation. r876 Various cosmetic changes. r875 avg_weight_sse2 r874 many changes to which asm functions are enabled on which cpus. with Phenom, 3dnow is no longer equivalent to "sse2 is slow", so make a new flag for that. some sse2 functions are useful only on Core2 and Phenom, so make a "sse2 is fast" flag for that. some ssse3 instructions didn't become useful until Penryn, so yet another flag. disable sse2 completely on Pentium M and Core1, because it's uniformly slower than mmx. enable some sse2 functions on Athlon64 that always were faster and we just didn't notice. remove mc_luma_sse3, because the only cpu that has lddqu (namely Pentium 4D) doesn't have "sse2 is fast". don't print mmx1, sse1, nor 3dnow in the detected cpuflags, since we don't really have any such functions. likewise don't print sse3 unless it's used (Pentium 4D). r873 enable ssse3 phadd satd on Penryn. r872 benchmark most of the asm functions (checkasm --bench). r871 Cosmetic: fix C99-ism r870 Use a gaussian window for cplxblur Cplxblur was originally intended to use a gaussian window, but in its current form did not.This change provides a tiny improvement to 2pass ratecontrol. r869 cosmetics r868 nasm compatible NX stack r867 CQP is incompatible with AQ r866 memzero_aligned_mmx r865 binmode stdin on mingw, not just msvc r864 omit redundant mc after non-rdo dct size decision, and in b-direct rdo r863 allow fractional CRF values with AQ. r862 fix some uninitialized partitions in rdo r861 2-pass VBV support and improved VBV handling Dramatically improves 1-pass VBV ratecontrol (especially CBR) and provides support for VBV in 2-pass mode.This consists of a series of functions that attempts to find overflows and underflows in the VBV from the first-pass statsfile and fix them before encoding. 1-pass VBV code partially by Dark Shikari. r860 Fix noise reduction in threaded mode. Previously enabling noise reduction with threads had no effect. Note that this is not an optimal solution; each thread still tracks noise reducation separately (unlike in single-threaded mode). r859 fix a crash on win32 with threads. r852 introduced an assumption in deblock that the stack is aligned. r858 remove nasm version check. a feature check is all that's needed. silence stderr in yasm version check. r857 cosmetics in cabac r856 faster residual_write_cabac r855 change DEBUG_DUMP_FRAME to run-time --dump-yuv r854 x264_median_mv_mmxext this is the first non-runtime-detected use of mmxext, but it has to be inlined r853 factor duplicated code out of deblock chroma mmx r852 deblock_luma_intra_mmx r851 write aspect ratio in mp4 r850 omit delta_quant in i16x16 blocks with no residual (all other block types were already covered, but i16x16 cbp is special) r849 explicit write combining, because gcc fails at optimizing consecutive memory accesses r848 force unroll macroblock_load_pic_pointers and a few other minor optimizations r847 quant_2x2_dc_ssse3 r846 r836 borked lossless cabac nnz r845 use elf instead of a.out on netbsd r844 fix x264_realloc when not using libc realloc. r843 don't pretend to support win64. remove all related code. it hasn't worked since probably some time in 2005, and won't ever be fixed unless someone steps up to maintain it. r842 cosmetics: replace last instances of parm# asm macros with r# r841 remove DEBUG_BENCHMARK r840 faster probe_skip r839 drop support for pre-SSE3 assemblers r838 s/x264_cpu_restore/x264_emms/ no point in giving it a generic name when it's not generic r837 faster cabac_mb_cbp_luma ported from ffmpeg r836 remove some redundant nnz counts move some nnz counts from macroblock_encode to cavlc if cabac doesn't need them r835 compute missing nnz count in subme7 cavlc r834 remove a division in macroblock-level bookkeeping r833 omit P/B-skip mc from macroblock_encode if the pixels haven't been overwritten since probe_skip r832 earlier termination in SEA if mvcost exceeds residual r831 remove void* arithmetic from r821 r830 Fix define of illegal function identifiers (as defined in section "7.1.3 Reserved identiers" of C99 spec) r829 Fix define of illegal identifier (as defined in section "7.1.3 Reserved identiers" of C99 spec) "__UNUSED__", and use the one defined in common/osdep.h, i.e. "UNUSED" based on a patch by Diego Biurrun r828 more consistent include name (in line with other PPC includes) r827 fix illegal identifiers in multiple inclusion guards patch by Diego Biurrun % diego A biurrun P de % r826 AQ now treats perfectly flat blocks as low energy, rather than retaining previous block's QP. fixes occasional blocking in fades. r825 checkasm cabac r824 s/movdqa/movaps/g r823 --asm to allow testing of different versions of asm without recompile r822 copy left neighbor pixels directly from previous mb instead of main plane r821 cacheline split workaround for mc_luma r820 add "SECTION_RODATA" before "SECTION .text" to setup the fakegot label used in macho binaries. This fixes compilation with --enable-pic Requires Yasm 0.7.0 or newer Patch by Dave Lee % davelee P com A gmail P com % r819 more hpel fixes r818 update msvc projectfile r817 r810 borked hpel_filter_sse2 on unaligned buffers r816 threads=auto on multicore now implies thread input, just like explicit thread numbers already did r815 dct4 sse2 r814 faster x86_32 dct8 r813 macros to deal with macros that permute their arguments r812 mmx cachesplit sad of non-square sizes checked height instead of width r811 sfence after nontemporal stores r810 simplify hpel filter asm (move control flow to C) and add sse2, ssse3 versions r809 more mmx/xmm macros (mova, movu, movh) r808 improve handling of cavlc dct coef overflows support large coefs in high profile, and clip to allowed range in baseline/main r807 fix shared libs on MacOSX based on a patch by İsmail Dönmez r806 typo in r803 r805 fix a crash on mp4 muxing with invalid params r804 variance-based psy adaptive quantization new options: --aq-mode --aq-strength AQ is enabled by default r803 fix naming of .dll on mingw r802 don't distinguish between mingw and cygwin r801 remove a memset r800 typo. don't evaluate rd pskip when p16x16 found ref>0. r799 r784 borked lossless dc zigzag r798 fix an arithmetic overflow that disabled SEA threshold after finding a mv with SAD < mvcost. r797 fix hpel_filter_altivec picked up by checkasm Patch by Manuel %maaanuuu A gmx.net % and Noboru Asai % noboru P asai A gmail P com % r796 faster residual r795 nasm doesn't like align(nop) in structs r794 reduce the size of some cabac arrays r793 use cabac context transition table from trellis in normal residual coding too r792 rearrange cabac struct to reduce code size r791 higher precision RD lambda improves quality at QP<=12. r790 faster cabac_encode_ue_bypass r789 cabac asm. mostly because gcc refuses to use cmov. 28% faster than c on core2, 11% on k8, 6% on p4. r788 cosmetics in cabac r787 inline cabac_size_decision r786 cosmetics in DECLARE_ALIGNED r785 don't distinguish between luma4x4 and luma4x4ac r784 faster lossless zigzag r783 more alignment r782 add tesa and lossless to fprofile r781 cosmetics in residual_write r780 remove unused bitstream reader r779 cosmetics in quant asm r778 special case dequant for flat matrix r777 faster dequant r776 simplify hpel_filter_c r775 use x264_mc_copy_w16_sse2 in mc.copy, it was previously only in mc_luma r774 new ssd_8x*_sse2 align ssd_16x*_sse2 unroll ssd_4x*_mmx r773 update altivec zigzags r772 r768 borked cavlc r771 cosmetics in intra predict r770 faster intra predict 8x8 hu/hd r769 reduce zigzag arrays from int to int16_t r768 reduce the size of some arrays r767 skip intra pred+dct+quant in cases where it's redundant (analyse vs encode) large speedup with trellis=2, small speedup with trellis=0 and/or subme>=6 r766 cosmetics in asm r765 satd_4x4_ssse3 r764 get_ref_sse2 r763 continue instead of crash when the threading mv constraint is violated. doesn't fix the underlying bug, but hopefully less annoying until we find it. r762 remove remaining reference to clip1.h r761 fix name mangling again. apparently it's not just a convention, dll build fails if you try to export a non-prefixed name. r760 update msvc projectfile r759 missing #ifdef HAVE_SSE3 r758 don't define offsetof since it's standard r757 shut up gcc warning in offsetof r756 increase alignment of mv arrays r755 memcpy_aligned_sse2 r754 checkasm check whether callee-saved regs are correctly saved x86_32 only for now since x86_64 varargs are annoying r753 fix x86_32 ads which failed to preserve a register r752 fix some name mangling issues introduced by the merge r751 remove x264_mc_clip1. it's wrong for sufficiently perverse inputs, and clip_uint8 is faster anyway. r750 merge x86_32 and x86_64 asm, with macros to abstract calling convention and register names r749 git compatible version script r748 check for broken versions of yasm r747 increase the alignment of the i8x8 edge cache, needed for sse2 intra prediction. patch by Alexander Strange. r746 .gitignore r745 pic macros now keep track of which register holds the GOT, so variable access doesn't have to care r744 remove x86_64 predict_8x8_ddl_mmxext because sse2 is faster even on amd r743 cosmetics in dsp init r742 sse2 16x16 intra pred. port the remaining intra pred functions from x86_64 to x86_32. patch by Dark Shikari. r741 some simplifications to mmx intra pred that should have been done way back when we switched to constant fdec_stride. and remove pic spills in functions that have a free caller-saved reg. patch partly by Dark Shikari. r740 faster array_non_zero r739 x86_32 sse2 idct8 ported from ffmpeg by Dark Shikari r738 checkasm: relax the threshold for floating-point ssim r737 checkasm: test idct with the range of coefficients what can really be encountered, as opposed to random numbers which might overflow. r736 intra_rd_refine in B-frames r735 print average of macroblock QPs instead of frame's nominal QP r734 update date r733 remove colorspace conversion support, because it has no business in any codec r732 misc fixes in checkasm r731 remove a useless bit of me=umh (originally copied from JM, where it was used for something) r730 fix a memleak in cqm r729 fix a memleak in mkv muxer patch by saintdev r728 satd exhaustive motion search (--me tesa) r727 fix cabac context for nonzero delta_qp of the 2nd mb of a frame in interlaced mode r726 fix mapping of mvs to partitions in p4x4_chroma patch by Noboru Asai r725 fix mvp for b16x8 and b8x16 L1 search patch by Wei-Yin Chen r724 shave a couple cycles off cabac functions r723 faster and smaller x264_macroblock_cache_mv etc r722 configure test for endianness r721 change the meaning of --ref: it now selects DPB size (including B-frames), rather than L0 size (which B-frames are added to) r720 add / fix support for FreeBSD, based on a patch by Igor Mozolevsky % igor A hybrid-lab P co P uk % r719 shut up some valgrind warnings r718 slightly wrong memory allocation in r717, fixes a potential crash with merange>32 r717 convert absolute difference of sums from mmx to sse2 convert mv bits cost and ads threshold from C to sse2 convert bytemask-to-list from C to scalar asm 1.6x faster me=esa (x86_64) or 1.3x faster (x86_32). (times consider only motion estimation. overall encode speedup may vary.) r716 round esa range to a multiple of 4 r715 use define _WIN32 instead of __WIN32__ or WIN32 defines. NSDN reference: http://msdn2.microsoft.com/en-us/library/b0084kay(VS.80).aspx Patch by BugMaster %BugMaster A narod P ru% Original thread: date: Dec 27, 2007 3:18 AM subject: [x264-devel] VS2008 compilation error (need of replacement __WIN32__ with _WIN32) r714 tweak x264_pixel_sad_x4_16x16_sse2 horizontal sum. 168 -> 166 cycles on core2. r713 fix a nondeterminism involving 8x8dct, rdo, and threads. r712 also test arch-specific x264_zigzag_* implementations in checkasm.c patch by Patch by Noboru Asai % noboru P asai A gmail P com% r711 Add AltiVec implementation of - x264_zigzag_scan_4x4_frame_altivec() - x264_zigzag_scan_4x4ac_frame_altivec() - x264_zigzag_scan_4x4_field_altivec() - x264_zigzag_scan_4x4ac_field_altivec() each around 1.3 tp 1.8x faster than C version Patch by Noboru Asai % noboru P asai A gmail P com% r710 adds AliVec implementation of predict_16x16_p() over 4x faster than C version r709 revert the x86_32 part of r708. elf shared libraries aren't important enough to be worth the extra lines of code to check for nasm. r708 mark asm functions as hidden r707 check whether ld supports -Bsymbolic before using it r706 reduce the data type used in some tables. 16KB smaller exe. r705 faster removal of duplicate mv predictors r704 avoid a division in x264_mb_predict_mv_ref16x16. patch by Dark Shikari. r703 avoid a division in umh. patch by Dark Shikari. r702 fix a memleak in h->mb.mvr r701 fix compilation as a shared library on x86_64 (regression in r696) r700 add support for x86_64 on Darwin9.0 (Mac OS X 10.5, aka Leopard) Patch by Antoine Gerschenfeld %gerschen A clipper P ens P fr% r699 cover some more options in fprofile. (esa, bime, cqm, nr, no-dct-decimate, trellis2) previously, esa was slower with fprofile than without, since gcc thought it wasn't important. now esa benefits like anything else. r698 Add AltiVec implementation of x264_pixel_ssd_8x8, 3x faster than C version Overall speed-up: 0.7% with--bframes 3 --ref 5 -m 7 --b-rdo Patch by Noboru Asai %noboru P asai A gmail P com% r697 limit mvs to [-512,511.75] instead of [-512,512] r696 avoid memory loads that span the border between two cachelines. on core2 this makes x264_pixel_sad an average of 2x faster. other intel cpus gain various amounts. amd are unaffected. overall speedup: 1-10%, depending on how much time is spent in fullpel motion estimation. r695 add cache info to cpu_detect. also print sse3. r694 cosmetics: reorder mc_luma/mc_chroma/get_ref arguments for consistency with other functions r693 separate pixel_avg into cases for mc and for bipred r692 add AltiVec implementation of ssim_4x4x2_core, about 4x faster than C version. Overall: 0.1-0.2% faster with default encoding settings Patch by Noboru Asai %noboru P asai A gmail P com% r691 Add AltiVec implementation ofx264_hpel_filter. Provides a 10-11% overall speed-up with default encoding options Patch by Noboru Asai %noboru P asai A gmail P com% r690 cosmetics in dsp function selection r689 remove sad_pde. it's been unused ever since successive elimination replaced it. r688 cosmetics: use symbolic constants for frame padding radius r687 move hpel_filter cpu detection to a function pointer like everything else r686 cosmetics: use separate variables for frame width and stride r685 Add AltiVec implementation of add4x4_idct, add8x8_idct, add16x16_idct, 3.2x faster on average 1.05x faster overall with default encoding options Patch by Noboru Asai % noboru DD asai AA gmail DD com % r684 add AltiVec implementation of dequant_4x4 and dequant_8x8, 2.8x faster than C, 1.01x faster than previous revision with default encoding options Patch by Noboru Asai % noboru DD asai AA gmail DD com % r683 Add AltiVec implementation of quant_2x2_dc, fix Altivec implementation of quant_(4x4|8x8)(|_dc) wrt current C implementation Patch by Noboru Asai % noboru DD asai AA gmail DD com % r682 fix a possible nondeterminism with me=umh + threads. r681 use hex instead of dia for rdo mv refinement. ~0.5% lower bitrate at subme=7. patch by Dark Shikari. r680 port sad_*_x3_sse2 to x86_64 r679 don't overwrite pthread* namespace, because system headers might define those functions even if we don't want them r678 faster 4x4 sad r677 fix an arithmetic overflow in trellis at high qp. r676 implement multithreaded me=esa r675 fix some integer overflows. now vbv size can exceed 2 Gbit. r674 allow --vbv-init to take absolute values (in kbit), in addition to the previous fractions of vbv-bufsize. r673 remove a bashism r672 reorder headers so that largefile support is defined before the first copy of stdio r671 regression in r669: broke saving of configure args if make has to re-run configure r670 regression in r669: --enable-shared should imply --enable-pic on some archs. r669 * Add a --host flag to allow overriding config.guess; this is particularly useful with a 64-bits kernel running a 32-bits userland to build 32-bits apps. * Normalize any host triplet into a quadruplet via config.sub. * Move option parsing before any use of architecture information. r668 * Update config.guess. r667 mingw doesn't have strtok_r r666 move os/compiler specific defines to their own header r665 extend zones to support (some) encoding parameters in addition to ratecontrol. r664 cosmetics r663 limit vertical motion vectors to +/-512, since some decoders actually depend on that limit. r662 Add vertical and horizontal luma deblocking accelerated with Altivec, based on Graham Booker's code written for FFmpeg with slight modifications to re-use x264's macros r661 cosmetics in cpu detection r660 fix compilation without asm on x86_32 (r658 worked only on x86_64). r659 exempt 1080p from the non-mod16 warning. r658 r657 r656 r655 require a ratecontrol method to be specified, it no longer defaults to cqp=26. r654 fix nnz computation in cavlc+8x8dct+deblock. (regression in r607) r653 fix the computation of bits used for vbv. (regression in r651) r652 c89 compile fix r651 cabac: use bytestream instead of bitstream. 35% faster cabac, 20% faster overall lossless, ~1% faster overall at normal bitrates. r650 remove the restriction on number of threads as a function of resolution (it was wrong anyway in the presence of B-frames), and raise the max number of threads in general (though more will have to be done before it can really scale to lots of cores). r649 tweak ssse3 quant r648 change some tables from int to int8_t. 13KB smaller executable. r647 faster cabac rdo. up to 10% faster at q0, but negligible at normal bitrates. r646 workaround gcc's inability to align variables on the stack. this crash was introduced in r642, but only because previous versions didn't use sse2 on the stack. r645 32bit version of ssse3 satd. switch default assembler to yasm. it will still fallback to nasm if you don't have yasm. r644 simplify trellis r643 fix an arithmetic overflow in trellis with QP >= 42 r642 2x faster quant. 2% overall. side effects: not bit-identical to the previous algorithm. while the new algorithm covers a wider range of cqms than the previous one did, I couldn't find a good way to fallback to a general version for the extreme cqms. so now it refuses to encode extreme cqms instead of just being slower. lays a framework for custom deadzone matrices, though I didn't add an api. r641 when encoding with a cqm, probe_skip now also uses the cqm, instead of the flat matrix r640 cosmetics in asm macros r639 r638 in hpel search, merge two 16x16 mc calls into one 16x17. 15% faster hpel, .3% overall. r637 Compile fix r636 remove private stuff from public headers. no more need for -D__X264__ r635 adjust bitstream buffer sizes for very large frames r634 conflate HAVE_MMXEXT with HAVE_SSE2, since they were never used distinctly. r633 * Made -DNEED_ALTIVEC unnecessary, thanks to Guillaume Poirier. r632 * check x264_cpu_detect() before calling AltiVec functions. r631 ssse3 detection. x86_64 ssse3 satd and quant. requires yasm >= 0.6.0 r630 * Use -maltivec when building dependencies, or <altivec.h> cannot be used. * Do not declare vectors in non-AltiVec files. r629 * common/cpu.c: runtime AltiVec autodetection on Linux. * configure, Makefile: do not build the whole project with -maltivec because it generates AltiVec code in weird places. r628 fix a small memleak. patch by Limin Wang. r627 compile fix for GCC-3.3 on OSX, based on a patch by Patrice Bensoussan % patrice P bensoussan A free P fr% Note: regression test still do not pass with GCC-3.3, but they never did as far as I can remember. r626 cosmetics in regression test r625 r624 r623 oops, scenecut detection failed to activate when using threads and not using B-frames r622 extras/getopt.c was BSD licensed. replace with a LGPL version (from glibc). r621 Fix build issues on Linux. Only gcc-4.x is supported, as on OSX. Cleans up a few inconsistencies in the code too. r620 tweak block_residual_write_cavlc. up to 1% faster lossless, no difference at normal bitrates. r619 don't assume int is exactly 4 bytes r618 make array_non_zero() compatible with -fstrict-aliasing r617 Honor CFLAGS and LDFLAGS set by the user r616 Check whether 'echo -n' works, otherwise try printf (fixes build on current OS X 10.5) r615 Check version of nasm on OS X / Intel r614 wrong reference frames were used with refs>=14 + pyramid (regression in r607) r613 enable thread synchronization primitives on linux too r612 fix a crash with x264_encoder_headers() + threads r611 don't skip autodection on configure --enable-pthread r610 more win32threads -> pthreads r609 cosmetics: rename list operators to be consistent with Perl, and move them to common/ r608 win32: use pthreads instead of win32threads. for some reason, pthreads is much faster. r607 New threading method: Encode multiple frames in prallel instead of dividing each frame into slices. Improves speed, and reduces the bitrate penalty of threading. Side effects: It is no longer possible to re-encode a frame, so threaded scenecut detection must run in the pre-me pass, which is faster but less precise. It is now useful to use more threads than you have cpus. --threads=auto has been updated to use cpus*1.5. Minor changes to ratecontrol. New options: --pre-scenecut, --mvrange-thread, --non-deterministic r606 * Do not assume anything about sizeof(cpu_set_t). r605 * Add support for kFreeBSD (FreeBSD kernel with GNU userland). r604 Add Altivec implementations of add8x8_idct8, add16x16_idct8, sa8d_8x8 and sa8d_16x16 Note: doesn't take advantage of some possible aligned memory accesses, so there's still room for improvement r603 Force alignment of the fake .rodata on MacIntel r602 don't treat vbv_maxrate as a minrate too if it's higher than target average bitrate. r601 Merges Guillaume Poirier's AltiVec changes: * Adds optimized quant and sub*dct8 routines * Faster sub*dct routines ~8% overall speed-up with default settings r600 10% faster deblock mmx functions. ported from ffmpeg. r599 checkasm: ignore insignificant differences in floating-point ssim r598 display final ratefactor in abr when a loose vbv is applied. (still disabled in true cbr) r597 fix parsing of --deblock %d,%d(beta was ignored) r596 compute chroma_qp only once per mb r595 rd refinement of intra chroma direction (enabled in --subme 7) patch by Alex Wright. r594 fix a crash in avc2avi r593 skip deblocking and motion interpolation when using only I-frames r592 cosmetics r591 allow fractional values of crf r590 prefetch pixels for motion compensation and deblocking. r589 fix a crash on interlace + >8 reference frames r588 no more decoder. it never worked anyway, and the presence of defunct code was confusing people. r587 compute pskip_mv only once per macroblock, and store it r586 slightly faster chroma_mc_mmx r585 missing emms in plane_copy_mmx r584 merge center_filter_mmx with horizontal_filter_mmx r583 1.5x faster center_filter_mmx (amd64) r582 mmx/prefetch implementation of plane_copy r581 no more vfw r580 gtk fixes: in Makefile - fix datadir for mingw users - remove the shared lib during the clean rule - use $(ENCODE_BIN) instead of x264_gtk_encode - add some $(DESTDIR) and create some directories when necessary - remove -lintl statfile_length -> statsfile_length fix the "sensitivity" of the widget of update_statfile the logo is now handled correctly on windows added: beginning of multipass support patch by Vincent Torri. r579 accept mencoder's option names as synonyms (api only, not in x264cli) r578 simplify satd_sse2 r577 better error checking in x264_param_parse. add synonyms for a few options. r576 fix some strides that weren't a multiple of 16. r575 tweak motion compensation amd64 asm. 0.3% overall speedup. r574 strip local symbols from asm .o files, since they confuse oprofile r573 add an option to control direct_8x8_inference_flag, default to enabled. slightly faster encoding and decoding of p4x4 + B-frames, and is needed for strict Levels compliance. r572 allow custom deadzones for non-trellis quantization. patch by Alex Wright. r571 move zigzag scan functions to dsp function pointers. mmx implementation of interlaced zigzag. r570 support interlace. uses MBAFF syntax, but is not adaptive yet. r569 allow --zones in cqp encodes r568 cli: fix some typos in vui parameters from r542. patch by Foxy Shadis. r567 * Add an "all" rule to the Makefile. Ideally "default" should be renamed, but I don't want to break existing scripts. r566 workaround: on some systems, alloca() isn't aligned r565 missing picpop r564 fix a buffer overread from r540 r563 cosmetics (spelling) r562 faster ESA r561 faster ESA r560 * Use the autotool's config.guess script instead of uname to check the system and CPU types, to avoid issues when using for instance a 32-bit userland on top of a 64-bit kernel. r559 * Add the autotool's config.guess script so that we can use it instead of uname in the configure script. r558 10l in r553 r557 ssim broke on amd64 w/ pic. r556 r555 support changing some more parameters in x264_encoder_reconfig() r554 SSIM computation. (default on, disable by --no-ssim) r553 configure: --enable-debug reduces optimization to -O1 r552 cosmetics r551 gcc -fprofile-generate isn't threadsafe r550 cli: move some options from --help to --longhelp r549 cli: don't try to get resolution from filename unless input is rawyuv r548 r542 broke --visualize r547 Nicer OS X x264_cpu_num_processors (thanks David) r546 Support OS X and BeOS in x264_cpu_num_processors r545 Fixes contexts allocation with threads=auto r544 select initial qp for abr and cbr baased on satd and bitrate, rather than cq24. r543 --threads=auto to detect number of cpus r542 api addition: x264_param_parse() to set options by name r541 fix a rare NaN in ratecontrol r540 move quant_mf[] from x264_t to the heap, and merge duplicate entries r539 GTK update. patch by Vincent Torri. fixed: cleaning of Makefile time elapsed seems broken ('total time' label replaced by 'time remaining') text entries of the status window are now not editable added: compilation from x264/ (add --enable-gtk option to configure) shared lib creation if --enable-shared is passed to configure x264gtk.pc --b-rdo, --no-dct-decimate r538 new option: --qpfile forces frames types and QPs. (intended for ratecontrol experiments, not for real encodes) r537 api change: select ratecontrol method with an enum (param.rc.i_rc_method) instead of a bunch of booleans. r536 slightly faster mmx dct r535 OpenBSD build fixes. patch by Vizeli Pascal (pvizeli at yahoo dot de) r534 mc_chroma width2 mmx r533 make libx264.so symlink relative r532 GTK update. patch by Vincent Torri. added: direct=auto no-fast-pskip vbv cqm tooltips (without descriptions yet) translations `make clean` for .exe when file exists, ask for override fixes: debug level bug bitrate slider bug mixed-refs can be set only if ref>1 i8x8 can be set only if 8x8 transform is enabled # of threads capped at 4 fourcc can't be removed cosmetics r531 vfw installer: tweak nsis compression. patch by Francesco Corriga. r530 Fixed typo that caused x264_encoder_open to always fail r529 check some mallocs' return value r528 make -> $(MAKE) r527 convert non-fatal errors to message level "warning". r526 fix a memory alignment. (no effect on x86, but might be needed for other simd) r525 when using DEBUG_DUMP_FRAME, write decoded pictures in display order. patch by Loic Le Loarer. r524 non-referenced B-frames should have the same frame_num as the following ref frame, not the previous. patch by Loic Le Loarer. r523 set the SPS constraint_set[01]_flag based on the profile in use, just in case some decoder cares r522 msvc doesn't like C99 named array initializers r521 allow sar=1/1. patch by Loic Le Loarer. r520 faster intra search: filter i8x8 edges only once, and reuse for multiple predictions. r519 faster intra search: some prediction modes don't have to compute a full hadamard transform. x86 and amd64 asm. r518 --sps-id, to allow concatenating streams with different settings. r517 typo in expand_border_mod16 r516 typo impaired 2pass bitrate prediction. r515 Let the user choose the compiler with "CC=xxx ./configure" r514 More vector types fixes for gcc 3.3 r513 More vector casts to try and make compilers happier r512 Use sa8d instead of satd for i8x8 search. +.01 dB, -.5% speed r511 Before evaluating the RD score of any mode, check satd and abort if it's much worse than some other mode. Also apply more early termination to intra search. speed at -m1:+1%, -m4:+3%, -m6:+8%, -m7:+20% r510 * common/ppc/pixel.c: fixed illegal implicit casts of vector types. r509 * Added %$#@#$! support for #@%$!#@ armv4l CPU. r508 When evaluating predictors to start fullpel motion search, use subpel positions instead of rounding to fullpel. about +.02 dB, -1.6% speed at subme>=3 patch by Alex Wright. r507 mmx implementation of x264_pixel_sa8d r506 10l in r463 (q0 i16x16 dc was permuted) r505 typo in r504 r504 update msvc project files. patch by anonymous. r503 Before, we eliminated dct blocks containing only a small single coefficient. Now that behavior is optional, by --no-dct-decimate. based on a patch by Alex Wright. r502 Enables more agressive optimizations (-fastf -mcpu=G4) on OS X. Adds AltiVec interleaved SAD and SSD16x16. Overall speedup up to 20%. Patch by anonymous r501 faster cabac_encode_bypass r500 restored AltiVec dct r499 more AltiVec mc, ~4.5% overall speedup r498 slightly faster loopfilter r497 3% faster satd_mmx r496 cosmetics in sad/ssd/satd mmx r495 store quoted configure options. needed e.g. for multiple args under --extra-cflags. r494 fix a yasm-incompatible syntax in x86 asm r493 yasm noexec stack r492 more interleaved SAD. 25% faster halfpel. r491 more interleaved SAD. 1% faster umh, 6% faster esa. r490 interleave multiple calls to SAD. 15% faster fullpel motion estimation. r489 * Added support for ppc64. I'm really fucking tired of having to do this. r488 use LDFLAGS when linking shared lib r487 r486 GTK: support yuv4mpeg input. patch by Vincent Torri. r485 GTK: fix avs input patch by Vincent Torri. r484 cli: support yuv4mpeg input. patch by anonymous. r483 GTK: compilation fixes r482 GTK: compilation fixes on mingw, add avs input for the app (if avalaible), add filters for the filechooser, add icon for the main window. patch by Vincent Torri. r481 GTK-based graphical frontend. patch by Vincent Torri. r480 silence some gcc warnings r479 use FDEC_STRIDE instead of a parameter in mmx dct .5% speedup r478 * configure: support for 64 bits MIPS. r477 10l in r473 and stdin r476 RD subpel motion estimation (--subme 7) r475 cosmetics in cabac_mb_cbf r474 separate --thread-input from --threads r473 if --threads > 1, then read the input stream in its own thread. r472 FreeBSD uses ELF r471 10l in r470 on x86_64 r470 some mmxext functions really only required mmx. r469 simplify get_ref and mc_luma r468 b16x16 wpred analysis used wrong weight r467 configure: --enable-shared for libx264.so r466 wrong modulus when delta_qp = +26 r465 10l in vbv + 2pass r464 macroblock-level ratecontrol: improved vbv strictness, and improved quality when using vbv. r463 keep transposed dct coefs. ~1% overall speedup. r462 tweak rounding of 8x8dct r461 cosmetics in makefile r460 cosmetics: muxers -> muxers.c r459 no --nr in intra blocks. intra prediction doesn't work well enough for the residual to be indicative of noise. r458 10l in direct auto + multiref + 1pass r457 --direct auto selects direct mode per frame. works best in 2pass (enable in both passes). r456 change default direct mode to spatial r455 remove TODO. most of it is done, and the rest is out of date. r454 more amd64 mmx intra prediction r453 for i8x8 neighbors, don't assume a new slice starts at the edge of the frame r452 * common/i386/i386inc.asm: got PIC to work for real on OS X x86. r451 * common/i386/*.asm: don't use the "GLOBAL" reserved word, some versions NASM complain about it. Replaced it with "GOT_ebx". r450 * configure: activate minor nasm optimisations, such as assembling "add eax, 8" as "add eax, byte 8". r449 * common/i386: factored the .rodata section declaration into i386inc.asm. r448 * configure common/i386/i386inc.asm: got rid of -DFORMAT_* nasm flags and use built-in preprocessor tests instead. r447 * common/i386/i386inc.asm: tell the ELF linker about our stack properties so that it does not assume the stack has to be executable. r446 10l in r443 (p4x4 chroma) r445 copy current macroblock to a smaller buffer, to improve cache coherency and reduce stride computations. part 3: asm r444 copy current macroblock to a smaller buffer, to improve cache coherency and reduce stride computations. part 2: intra prediction r443 copy current macroblock to a smaller buffer, to improve cache coherency and reduce stride computations. part 1: memory arrangement. r442 h->mc.copy() r441 lowres intra used wrong neighboring pixels r440 trellis=2 slightly affected intra analysis even without subme=6 r439 * encoder/ratecontrol.c: OS X support for exp2f and sqrtf. r438 allow delta_qp > 26 r437 ratecontrol didn't always account for header bits, causing an undersize in multipass with --ratetol inf. r436 -q0 --b-rdo wasn't lossless r435 cosmetics r434 allow ',' separator for --filter r433 VfW: 10l in bime and refs r432 more lowres mv clipping fixes r431 VfW: cosmetics r430 VfW: support trellis, brdo, nr, bime. patch by Dan Nelson (dnelson at allantgroup dot com). r429 amd64 mmx for some intra pred functions r428 dequant_mmx made incorrect assumptions about extreme inputs. now uses 32bit in more cases. patch by Christian Heine. r427 lowres can reuse the normal mv cost table r426 r422 broke x264_center_filter_mmxext r425 * configure: define FORMAT_ELF under Linux and FORMAT_AOUTB under *BSD. r424 * common/i386/i386inc.asm: support for ELF, a.out and Mach-O objects. r423 * configure: added a --enable-pic flag. r422 * Additional fixes to the PIC versions of assembly routines. They now pass all checkasm tests and output streams are bit-by-bit identical, which sounds good. r421 * tools/checkasm.c: print the random seed used for the test, to allow for replays. It looks like dequant_4x4 fails 1 time out of 600, with the following seeds for instance: 1423 1957 2149 2455 3385 3403 3724 4095. r420 cosmetics in mc_chroma r419 * Oh, so what I thought was unused code was in fact used. This fixes my breakage but makes the code rather slow in PIC mode. I will fix it later. r418 * Support for x86 position-independent code (PIC), needed for dynamic libs on Mac OS X Intel. I tried to make this as little intrusive as possible. r417 msvc: #define isfinite() r416 x86 mmx for some intra pred functions r415 cosmetics: reorganize intra prediction dsp r414 too many systems don't have off_t; use uint64_t instead. r413 fix order of frame evaluation in pre-me r412 update AUTHORS r411 fix a check for NaN in ratecontrol r410 fix mv predictors in pre-me for b-adapt. r409 print --nr in sei params. tweak ratecontrol param checking. r408 I've moved r407 write correct VUI timing info r406 early termination in UMH search r405 split mv_range enforcement from edge-of-frame clipping. fixes an occasional artifact with long mvs. r404 cosmetics: suppress warning on unused variables r403 cosmetics: simplify #includes r402 * configure: NSLU2 platform support (why oh why) r401 Re-enabled x86 optims on MacIntel, assume Nasm CVS is installed and -f macho -DPREFIX just seems to do the job r400 Quick compile fix for OS X / Intel Optimizations are disabled at the moment. In order to get them to work, we'd need either nasm to be able to output Mach-O object files, or we should convert the assembly code to something OS X can handle, like gas. r399 cli: large file support r398 dct-domain noise reduction (ported from lavc) r397 early termination within large SADs. ~1% faster UMH, ~4% faster ESA. r396 mkv: increase nalu size size to 4 bytes. patch by Haali. r395 less 64bit math: 12% faster trellis r394 more error checking of input parameters r393 always write sps.vui r392 use some extra packing modes for CQM headers. fix typo in --cqm4p[yc]. r391 MSVC compatibility fixes r390 joint bidirectional motion refinement (--bime) r389 fix some overflows in mp4 timestamps. patch by Francesco Corriga. r388 Successive elimination motion search: same as exhaustive search, but 2-3x faster. r387 Fixed cc_check on OS X (gcc -o /dev/null always fails) r386 postpone pskip decision until after p16x16ref0 motion search. reduces the number of erroneous pskips in low-detail regions. r385 configure: autodetect gpac, avis, pthread, vfw r384 --no-fast-pskip patch by Alex Wright. r383 cosmetics: config.h is now modified only by configure. make now calls configure if you haven't. r382 MP4: set "track enabled" flag. patch by Robert Swain. r381 faster subpel motion search. patch by Alex Wright. r380 don't use gnu extensions to grep and sed. r379 pkg-config: major.minor.patch version r378 `make fprofiled` to automate gcc -fprofile-generate/use r377 10l r376 param.b_repeat_headers (not yet used) r375 support pkg-config. patch by Caro. r374 write encoding options to the userdata SEI and to the 2pass statsfile. check for incompatible options in the 2nd pass. r373 change default level to "5.1" r372 skip dequant+idct of decimated blocks. r371 after a 1pass ABR, print the value of --crf which would result in the same bitrate. r370 subpel search: always check mvp. r369 faster b-rdo (skip RD of modes with bad SATD). patch by Alex Wright. r368 RD mode decision for B-frames (--b-rdo) patch by Alex Wright. r367 * common/amd64/quant-a.asm: added missing GLOBAL flags that prevented PIC builds, thanks to Anssi Hannula. r366 * configure: added the Alpha platform. r365 use array_non_zero() when we don't need a full array_non_zero_count() r364 mmx dequant. up to 3% speedup w/ RD. r363 allow --level to understand names in addition to idc r362 check (most of) the levels constaints. set default max_mv_range based on level_idc. r361 if p16x16 RD decides to code a MB as p_skip, then don't check smaller partitions. r360 Trellis RD quantization. around +.2 dB r359 cosmetics: XCHG macro r358 skip a few duplicate candidates in qpel search. r357 skip a few duplicate candidates in fullpel hex&umh search. r356 cli: arithmetic overflow in bitrate printing r355 cosmetics in x264_cabac_mb_type r354 X264_ABS => abs r353 amd64 sse2 8x8dct. 1.45x faster than mmx. r352 allow 1pass ratecontrol with keyint=1 r351 cli: print estimated time left in --progress r350 doc/ratecontrol.txt r349 rm doc/dct.txt r348 in constant QP mode, write that QP in the PPS to save a few bits in each slice header. r347 faster decimation r346 cosmetics: fix an erroneous warning from r340. r345 cosmetics: change literal cabac_block_cat to an enum. r344 cabac: merge i_state with i_mps. bs_write multiple bits at once. r343 remove unused adaptive cabac_idc code r342 Fixed compilation on PPC (spotted by David Wolstencroft) r341 mmx deblocking. 2.5x faster deblocking functions, 1-4% overall. r340 If frame count is known at init time (cli & vfw), then abort if the 2nd pass exceeds the length of the 1st pass. If it's not known (mencoder), then report a non-fatal error when we run off the end of the 1st pass stats, and switch to constant QP. r339 move checkasm to tools/ delete unused stuff in testing/ `make clean` deletes checkasm and avc2avi r338 checkasm: check 8x8dct, mc average, quant, and SSE2. r337 r336 broke amd64 x264_pixel_sad_16x16_sse2 (though it's not being used) r336 Windows 64bit asm. patch by squid_80. r335 delete build/cygwin because it's handled in the main configure/makefile. r334 --crf: 1pass quality-based VBR. r333 Added --enable-gprof (patch by Johannes Reinhardt) r332 cosmetics: remove #if0'ed code patch by Robert Swain. r331 faster bs_write r330 during RDO, skip the bitstream writing and just calculate the number of bits that would be used. speedup: cabac +4-8%, cavlc +2-4%. r329 Use SAD instead of SATD for halfpel motion search. Move multiref termination after halfpel search. Total: 3-7% speedup and +/-.02 dB. patch by Alex Wright. r328 VfW: mixed refs. patch by celtic_druid. r327 allow non-mod16 resolutions r326 VfW: prevent duplicate free() in compress_end() r325 cosmetics: remove declarations of nonexistent asm functions r324 cosmetics (whitespace) in VfW r323 VfW: some reorganization patch by Francesco Corriga. r322 cosmetics: merge some duplicate tables r321 remove cabac byte-stuffing code, because it just wastes bits in lossless, and does nothing at all at sane bitrates. r320 don't allocate lowres planes if they won't be used (i.e. in the 2nd pass). r319 cosmetics: move some stuff from macroblock_encode to cache_save r318 new option: --mixed-refs Allows each 8x8 or 16x8 partition to independently select a reference frame, as opposed to only one ref per macroblock. patch mostly by Alex Wright (alexw0885 at hotmail dot com). r317 cosmetics in option parsing r316 expose the rest of the VUI flags. patch by Christian Heine. r315 * common/amd64/mc-a.asm: use RIP-relative addressing in PIC mode. r314 temporal predictors for 16x16 motion search. r313 slightly faster/cleaner block_residual_write_cabac r312 cosmetics r311 cli: fix a crash on piped input. r310 stats summary: separately report all 5 partition sizes, and add ref usages r309 disposable frames shouldn't get their own coded_frame_num. r308 typo in ia32 x264_pixel_avg_weight_w8_mmxext r307 mmx avg (already existed by not used for bipred) mmx biweighted avg (3x faster than C) r306 cosmetics: move avg function ptrs from pixf to mc. r305 with B-pyramid, forget old refs in POC order instead of coded order. (before, b_skip was unavailable with pyramid and ref=1) r304 typo in r296. patch by lurui. r303 * common/amd64/*.asm: use RIP-related addressing in PIC mode. r302 * common/amd64/mc-a.asm: removed useless global variables r301 * configure: support extra $(ASFLAGS) through --extra-asflags. r300 reorganized VfW UI. patch by Antony Boucher, graphic by Jarod. r299 MP4 output: update to GPAC 0.4 API. patch mostly by Robert Swain. r298 faster mmx quant 15bit, and add 16bit version. total speedup: ~0.3% patch by Christian Heine. r297 faster mmx satd. *x16: 20%, *x8: 10%, total: 2-4%. ia32 patch by Christian Heine, amd64 port by me. r296 allow i4x4 and i8x8 down-left prediction with emulated top-right samples. based on a patch by Johannes Reinhardt (Johannes dot Reinhardt at uni-konstanz dot de) r295 r294 * configure: added support for ia64, mips/mipsel, m68k, arm, s390 and hppa platforms, as well as linux sparc. r293 MMX quantization functions, and optimization of the C versions. about 3x faster quant_8x8, quant_4x4, quant_4x4_dc, and quant_2x2_dc. total speedup: 4-10%. patch by Alexander Izvorski and Christian Heine. r292 SSE2 pixel comparison functions P4: SAD 16x*, SSD 16x*, SATD 16x*: 30% faster, SATD 8x8: 15% faster, total: 2-4% faster K8: SSD 16x*: 6% faster, total: not much patch by Alexander Izvorski. r291 10l in rev290: duplicate declaration of x264_pixel_sub_8x8_mmx. r290 mmx 8x8 dct. On a K8: sub16x16_dct8 3806->1461, add16x16_idct8 4852->1297 cycles. total speedup: 1-3%. patch by Christian Heine (sennindemokrit at gmx dot net) r289 VC++ fix (thx fenrir) r288 x264.h: issue an explicit warning when neither stdint.h nor inttypes.h has be included before x264.h r287 VfW: SAR wording. patch by Sharktooth. r286 cli: workaround to allow "--ratetol inf" on win32. r285 r284 * all: Patch by Mike Matsnev : "The following things were fixed: * AR calculation was broken on previous import * Wrong conditional in write_nalu_mkv() was fixed * Error checking was added in all places" r283 xyuv: bug fixes + autodetect of video size. r282 Run ranlib after make install (OS X needs that) r281 update i_mb_b16x8_cost_table[] for I8x8 mb type (r278 only fixed a symptom). r280 * all: Added matroska writing. Patch by Mike Matsnev. r279 * pixel.*: "I have completed additonal SAD implementations (8x16, 16x8 and 16x16) using Sparc VIS.Overall speedup is roughly 90% from straight C.I'm doing development and testing on a Sun Fire V220, with 2 * 1.5ghz UltraSPARC-III CPUs. I've hand-unrolled each of the loops.Sun's assembler does not appear to have macro functionality built-in and I didn't want to establish an external dependancy on m4.Please let me know if you run into any trouble with the patch." Patch by Phil Jensen. r278 analyse: "It correct the size of array i_mb_b16x8_cost_table from 16 to 17,otherwise,it can result a mismatch of b16x8 mb type cost and can result memory read overflow on it." Patch by lurui. r277 * x264 compilation on NetBSD. Patch by Mike Matsnev. r276 * all: "8x8 SAD written in Sparc Assembly using VIS." Patch by Phil Jensen. r275 10l: rd score for sub-8x8 partitions used wrong mvs. r274 faster SAD_INC_2x16P for amd64. patch by Josef Zlomek. r273 Fixed win32 handle leakage (thanks Trax) Default enabled support of threads on BeOS r272 * Add support for UltraSparc (uname -m: sun4u) with Solaris. Patch by Tuukka Toivonen. r271 * Faster SAD_INC_2x16P. Patch by Alexander Izvorski. r270 example quant matrix file r269 --cqmfile reads quant matrices in a JM-compatible format. r268 adjust coded buffer size based on input resolution and QP (old default wasn't enough for HD lossless) r267 update avc2avi for high profile r266 custom quant matrices r265 VfW: workaround a windows unicode bug. patch by Leowai. r264 lossless mode enabled at qp=0 r263 VfW: enable RDO. some option dependencies. patch by Francesco Corriga. r262 rate-distortion optimized MB types in I- and P-frames (--subme 6) r261 more VfW options. patch mostly by celtic_druid. r260 VFW: 8x8 transform, SAR. patch by celtic_druid. r259 threads option in vfw. patch by celtic_druid. r258 win32 threads enabled by default r257 vfw installer nsis script. patch by Francesco Corriga. r256 print 8x8 transform usage % in stats summary. r255 revert 216, another try at max_dec_frame_buffering. disable adaptive cabac_idc by default; 0 is always best anyway. r254 typo in cabac tables r253 cosmetics r252 fix i8x8 decision with chroma_me r251 SATD-based decision for 8x8 transform in inter-MBs. Enable 8x8 intra. CLI options: --8x8dct, --analyse i8x8. r250 Use win32 native threads (you still have to --enable-pthread to use them, though) r249 slightly faster 8x8 dct r248 remove unused tables from SPS/PPS. reduces overhead when syncing threads. r247 10l (debug stuff in 246) r246 8x8 transform and 8x8 intra prediction. (backend only, not yet used by mb analysis) r245 cosmetics r244 fix a bug with cabac + B-frames + mref + slices. call visualization per frame instead of per slice. r243 accept the standard --prefix etc. options r242 tweak cflags r241 Fixed multithreading on BeOS (pthread emulation required) r240 multithreading (via slices) r239 move zones parsing to ratecontrol.c; allows passing in zones as a string. r238 UMHex motion seach (but no early termination yet) r237 Zoned ratecontrol. r236 fix rounding of intra dequant when qp<=3 r235 API: x264_encoder_reconfig(). (not yet used by any frontend) r234 Makefile: in target "install", first create the directories if they don't already exist r233 Optimized subXxX_dct r232 s/==/=/ r231 ppc/: compile fixes for Linux/PPC (courtesy of Rasmus Rohde) and for gcc < 4 r230 visualize reference pic numbers. misc cleanups in visualization. patch by Tuukka Toivonen. r229 ppc/*: more tuning on satd (+5%) r228 CLI option: --seek r227 CLI option: --visualize Displays the encoded video along with MB types and motion vectors. patch by Tuukka Toivonen. r226 fix an uninitialized value in slicetype_analyse r225 port recent MC asm changes to amd64. patch by Josef Zlomek. r224 ppc/*: + Removed unused code + Optimized mc chroma 4xH and satd 8x4 and 4x8 + Won a bunch of cycles by not trusting gcc about inlining and unrolling properly (about 17% faster globally) r223 New ratecontrol options: 1pass ABR. VBV constraint for ABR and 2pass. There is no longer a dedicated CBR mode: use ABR+VBV. VfW now uses ABR instead of CQP for 1st of multipass. r222 use a predicted mv as starting point for subpel refinement. r221 slight speedup in halfpel interpolation. patch by Mathieu Monnier. r220 Cleaner allocation of tmp space in halfpel interpolation; fixes some valgrind/nasm warnings. patch by Mathieu Monnier. r219 "2pass failed to converge" is no longer considered fatal. r218 Updated MSVC project files. thanks to Bonzi. r217 cosmetics. silence some gcc warnings. amd64 doesn't need a separate copy of the c/h files, only the asm. r216 10l (214 wrote wrong DPB size in SPS -> B-pyramid broke) r215 CLI (mp4): return to 'capture' output mode, remove useless SetCtsPackMode() (fixed in gpac). Note: requires gpac cvs-20050419 or later. patch by bobo. r214 combined L0 & L1 reference lists are limited to a total of 16 pics. r213 amd64 asm patch, part2. by Josef Zlomek ( josef dot zlomek at xeris dot cz ) r212 amd64 asm patch, part1. r211 Allow manual selection of fullpel ME method. New method: Exhaustive search. based on a patch by Tuukka Toivonen. r210 misc makefile changes. propogate --extra-cflags to vfw. 'make clean' removes x264.exe and vfw. tweak dependencies. r209 10l (CLI: fflush after progress update) r208 CLI: progress indicator r207 VfW: build from main makefile r206 [mp4] ftyp & moov boxes at the begining of the file, (thanks to jeanlf for comments) patch by bobololo r205 CLI: --fps had side-effects. fixed. r204 CLI: cosmetics r203 Makefile: strip x264cli. tweak stats summary. r202 * x264.c: Fix ctts box creation. Patch by bobololo from Ateme. r201 common/ppc: more cleaning, optimized a bit r200 CLI: require output file (don't default to stdout). warn if trying to use mp4 or avis when not supported. misc cleanup. r199 configure:use -falign-loops=16 on OS X common/ppc/: added AltiVecized mc_chroma + cleaning checkasm.c:really fixed MC tests r198 Configure tweaks. Allow avis-input in mingw. Turn off debug by default. r197 checkasm.c: fixed MC tests r196 CLI: MP4 muxing. patch by bobo from Ateme. r195 Cygwin fixes r194 configure: ooops, restored -g ratecontrol.c: OS X has exp2f in -lmx checkasm: quick compile fix r193 add x86_64 to configure r192 set svn:ignore r191 Added a configure to detect the platform/system/etc so people don't have to edit the Makefile (will work for Linux/OS X/BeOS/FreeBSD, feel free to modify for others), and we can now remove the Jamfile which was broken most of the time anyway. r190 Makefiles: better dependencies for SEI version number r189 Forgot rbsp_trailing_bits in AUD NAL r188 Optionally use access unit delimiter NAL units. r187 VfW: cleaner install on win98. patch by Riccardo Stievano. r186 new util: countquant for 2pass statsfiles r185 print svn version number in SEI info and in CLI/VfW. r184 Make reconstructed frame available to caller. r183 make install r182 free() -> x264_free() r181 CLI: flush B-frames at the end of the encode r180 convert mc's inline asm to nasm (slight speedup and msvc compatibility). patch by Mathieu Monnier. r179 buffer overruns in slicetype_decision. patch by Mathieu Monnier. r178 tweak usage message r177 Simplify inter analysis option names. (psub16x16 -> p8x8) patch by Robert Swain. r176 173 broke .depend when debugging was enabled r175 early termination for intra4x4 analysis r174 Check/fix range of x264_param_t.rc.i_qp_constant. r173 Cleaned up and fixed Makefile for OS X and BeOS (hopefully FreeBSD too) It defaults for x86/linux, others: uncomment the lines for your platform & OS at the beginning of the Makefile r172 macroblock_analyse: simplify cost comparisons. (cosmetic) CLI: enable cabac by default. r171 Chroma ME (P-frames only). r170 SSE optimized chroma MC. patch by Radek Czyz. r169 167 broke psnr calculation for non-mod-32 inputs r168 sqrtf requires -lmx on Mac OS X r167 use mmx ssd for psnr calculation. r166 revert 164. blame Spyder. r165 SSD comparison function (not yet used). Cosmetics in mmx SAD. r164 VfW: reject YUY2 and RGB input formats r163 Really fix QP override. r162 write VUI bitstream restrictions r161 AVI & Avisynth input (win32 only). patch by bobo from Ateme. r160 expose option "chroma qp offset" r159 Fix per-frame QP override broken in rev 137. r158 Don't include x264.o in the library. r157 VfW: expose B pyramid and weighted B prediction. patch by Riccardo Stievano. r156 10l r155 buffer overrun when bframes == X264_BFRAME_MAX r154 Adaptive B skipped some POC numbers (slightly reducing b_direct efficiency). r153 avc2avi: Use POC to determine frame boundaries (frame_num couldn't distinguish consecutive B-frames). Fix keyframe flag to mark IDR only, not all I slices. r152 allow 16 refs (instead of 15) r151 report version number in decimal instead of hex r150 New option: "B-frame pyramid" keeps the middle of 2+ consecutive B-frames as a reference, and reorders frame appropriately. r149 smarter parsing of resolution from commandline r148 ratecontrol.c: fixed exp2f on BeOS so rate control works properly r147 Fix a buffer overrun with very long MVs. r146 wrong stride in lowres image r145 10l (fast1stpass was slower than non-fast) r144 Disable deblocking filter in frames of sufficiently low QP that it would have no effect. (Saves a little CPU time in the decoder.) r143 Simplify x264_frame_expand_border. r142 Altivec functions for MC using the cached halfpel planes. Patch by Fredrik Pettersson <fredrik_pettersson at yahoo dot se>. r141 Don't use uninitialize MVs in x264_mb_predict_mv_ref16x16. r140 Implicit weights in B16x16 analysis were swapped. patch by Radek Czyz. r139 Cosmetics: Some renaming. Move the rest of slice type decision from encoder.c to slicetype_decision.c r138 Take into account keyint_max in B-frame decision. r137 Preliminary adaptive B-frame decision (not yet tuned). Fix flushing of delayed frames when the encode finishes. r136 Write x264's version in a SEI message. r135 VfW: Enable weighted B prediction when max B-frames > 1. Enforce max reference frames <= 15. patch by Riccardo Stievano. r134 Add: implicit weighted prediction for B-frames. Slightly optimize x264_mb_mc_01xywh. Fix an error in B16x8 cost. r133 Oops, increment API number. r132 Configurable level. Levels are still not enforced; it's up to the user to select a level compatible with the rest of the encoding options. Patch by Jeff Clagg <snacky at ikaruga dot co dot uk>. r131 Always use the tempfile and rename method for multipass stats, so that VfW knows whether the previous pass completed. r130 More tweaks to bitrate prediction. Change error messages when 2pass fails to converge. r129 Improved 2pass bitrate predictor. No real change most of the time, but allows correct ratecontrol on some pathological videos that used to diverge completely. Also improves prediction when 2nd pass bitrate is very different from 1st pass. The new qscale2bits() has no simple inverse, so I also had to change rc_eq to output qscale instead of bits. r128 Some defines needed by MSVC, and convert the DSP files to DOS-style newlines. Patch by Radek Czyz. r127 Precalculate lambda*bits for all allowed mvs. 1-2% speedup. r126 Deblock B-frames. (Not yet used, since B-frames aren't kept as references.) r125 Simplify x264_mb_mc_01xywh() r124 Save some memcopies in halfpel ME. Patch by Radek Czyz. r123 Cache half-pixel interpolated reference frames, to avoid duplicate motion compensation. 30-50% speedup at subq=5. Patch by Radek Czyz. r122 In N-pass mode if stat_in and stat_out are the same file, instead save to a temp file and overwrite stat_in only when the encode finishes. r121 VfW: x264_log now creates a window for error messages r120 cosmetics r119 bs_align_1() didn't actually write all ones. (so encoded streams with cabac were technically invalid, though no decoder cares.) Patch by Tuukka Toivonen. r118 VfW: tweak option names r117 VfW: use separate stats files for each pass of an N-pass encode. r116 VfW: Enable multipass by default, increase the configurable range of I and B quant ratios. core: Tweak error messages. r115 r114 didn't completely fix the problem, trying again. r114 Another MV clipping fix. r113 Simplify x264_cabac_mb_type. r112 More accurate clipping rectangle for motion search. (slight compression improvement for high-motion scenes) r111 encoder/encoder.c: gcc < 3 compile fix r110 Change default level from 2.1 to 4.0 until I get around to calculating actual levels. r109 Clipping mvs to within picture + emulated border when running motion compensation. r108 Fix clipping of mvs in probe_pskip. (Previously it mixed up fullpel with qpel.) This should eliminate the black blocks that sometimes appeared in high motion, low detail scenes. r107 Fix length of strings stored in the registry. Patch by Riccardo Stievano. r106 registry values for min/max keyint were mixed up r105 VfW: expose option "Nth pass" (i.e. simultaneously read and update the multipass stats file). Patch by Riccardo Stievano. r104 add "make NDEBUG=1" to strip library r103 finish subpixel motion refinement for B-frames (up to 6% reduced size of B-frames at subq <= 3) r102 VfW: expose the 2pass ratecontrol option: qcomp ("bitrate variability"). Some rearranging of the advanced configuration dialogue. Patch by Riccardo Stievano <walkunafraid at tin dot it>. r101 VfW: Support ip_factor and pb_factor, some cleanups. patch by Riccardo Stievano <walkunafraid at tin dot it> r100 Use floats instead of int64 in log messages, since win32 (incl. mingw) doesn't understand %lld. Also display MB statistics in percent instead of number. r99 finished printf -> x264_log conversion. r98 Don't apply keyframe boost to I-frames that are followed by another I. r97 New VfW option: "fast 1st pass" automatically disables some partitions and reduces ME quality and number of reference frames. Removed option direct_pred=none, since it provides no benefits. Patch by Riccardo Stievano <walkunafraid at tin dot it>. r96 vfw: tweak wording and defaults r95 From Riccardo Stievano <walkunafraid at tin dot it>: here's a patch that fixes the VfW frontend after the changes made in revision 93 (GOP size management). Default values for i_keyint_max and i_keyint_min have been set to 250 and 10, respectively. r94 My last change of IDR decision broke in 2pass mode. fixed by remembering which frames are IDR. Disable benchmarking, as it was very slow for some people, and we already know that all the time is spent in macroblock analysis. r93 Changes the mechanics of max keyframe interval: Now enforces min and max GOP sizes, and allows variable numbers of non-IDR I-frames within a GOP. r92 MinGW compatible resource.rc by Radek Czyz r91 strict QP offset for B-frame vs following P-frame strict QP offset for I-frame vs GOP average r90 r72 broke B-frames without intra4x4. fixed. r89 updated VfW interface by Radek Czyz r88 improved mv prediction: 1-3% better compression of B-frames early termination for B-frame ref search: up to 20% faster with lots of refs. r87 allow constant qp on Nth pass (e.g. for forcing frame types) r86 disable subme=0 (the huge bitrate penalty wasn't worth the speed) renumber direct_pred r85 oops, last patch had some debug statements r84 fix: "x264 -A all" didn't include b8x8 types. add: "make NDEBUG=1" to strip library update TODO with B-frame status r83 Reorganize frame type selection: No longer produces consecutive I-frames when B-frames are enabled. Not thoroughly tested, but works for me. Fix scenecut detection when B-frames are present: Can now produce IDR, but is slower since it re-encodes more frames. This might reduce compression ratio in the presence of quick fade-ins. 2pass ratecontrol deals more gracefully with completely skipped frames. r82 remove Makefile.cygwin because build/cygwin/Makefile is more up to date. put correct object file names in .depend r81 reduce default verbosity, add option -v r80 remove relative include paths, to avoid conflicts with libtool r79 rename *.asm to avoid conflicts with libtool r78 list default settings in --help r77 replace EPZS diamond with a hexagon search pattern. early termination for multiple reference frame search (up to 1.5x faster). r76 sps->i_num_ref_frames was set higher than necessary r75 new option: --fps r74 various cleanups in macroblock caching. store motion data for each reference frame (but not yet used). r73 more accurate cost for psub8x8 modes. r72 implement macroblock types B_16x8, B_8x16 tweak thresholds for comparing B mb types r71 simplify x264_mb_predict_mv_direct16x16_temporal r70 option '--frames' limits number of frames to encode. patch by Tuukka Toivonen <tuukkat at ee.oulu.fi> r69 simplify calvc mb type r68 implement macroblock types B_SKIP, B_DIRECT, B_8x8 r67 rename 'core/' to 'common/', which avoids conflicts with libtool r66 cleanup stats reporting report B macroblock types report average QP r65 apply ip_factor and pb_factor in constant quantiser encodes. r64 save a little bit of memory r63 multiple hypothesis mv prediction: 1-3% improved compression, and .5-1% faster r62 * analyse: we can do 4x4 Horizontal Up mode when LEFT is avaible. r61 improved 2pass ratecontrol: ensures that I-frames have comparable quantizer to the following P-frames, and produces more consistent quality in areas of fluctuating complexity. r60 more informative error message when 2pass fails to converge r59 #include <stdarg.h> r58 cleanup spacing of frame stats with verbose logging. r57 typo in x264_cabac_mb_sub_b_partition (see ITU-T H.264 clause 9.3.3.1.2) r56 Typo r55 + No need to emulate memalign on OS X + Fixed Makefile for OS X (Original patch by Peter Handel) r54 Conditionally inits 1pass rc, only if it's enabled. This prevents a couple of irrelevant warnings from appearing in constant QP mode. (Loren Merritt <lorenm at u dot washington dot edu>) r53 Oops, changing those types messed up some vprintf's. fixed. (Loren Merrit <lorenm at u dot washington dot edu>) r52 filesize (bits) in a 32 bit int will overflow after 250MB, screwing up 2pass ratecontrol. (patch by Loren Merritt <lorenm at u dot washington dot edu>) r51 fix compilation on FreeBSD (from Loren Merritt (thanks to Igla)) r50 * ratecontrol: Patch by Loren Merritt : " This patch * calculates average QP as a float, providing slightly improved ratecontrol if the first pass was CBR. * fixes the reported QP if you set both b_stat_read and b_stat_write, allowing 3 pass encoding (or just examination of the 2nd pass's stats)." r49 * all: Patch by Loren Merritt. " This patch makes scene-cut detection based on the relative cost of I-frame vs P-frame, rather than just on the number of I-blocks used. It also makes the scene-cut threshold configurable. This doesn't have a very large effect: Most scene cuts are obvious to either algorithm. But I think this way is better in some less clear cut cases, and sometimes finds a better spot for an I-frame than just waiting for the max I-frame interval." r48 * ratecontrol: added 'b' flag to fopen. r47 * all: Patches by Loren Merritt: "Improved patch. Now supports subpel ME on all candidate MB types, not just on the winner. subpel_refine: (completely different scale from before) 0 => halfpel only 1 => 1 iteration of qpel on the winner (same as x264 r46) 2 => 2 iterations of qpel (about the same as my earlier patch, but faster 3 => halfpel on all MB types, qpel on the winner 4 => qpel on all 5 => more iterations benchmarks: mencoder dvd://1 -ovc x264 -x264encopts qp_constant=19:fullinter:cabac:iframe=200:psnr subpel_refine=1:PSNR Global:46.82 kb/s:1048.1 fps:17.335 subpel_refine=2:PSNR Global:46.83 kb/s:1034.4 fps:16.970 subpel_refine=3:PSNR Global:46.84 kb/s:1023.3 fps:14.770 subpel_refine=4:PSNR Global:46.87 kb/s:1010.8 fps:11.598 subpel_refine=5:PSNR Global:46.88 kb/s:1006.9 fps:10.824" And "The current code for calculating the cost of encoding which reference frame a MB is predicted from, introduces a bias towards ref0 and against P16x16. Removing this bias produces an improvement of .4% - 2% bitrate, depending on content and number of reference frames." r46 * x264: added --ipratio --pbratio in help section. r45 * ratecontrol: path by Loren Merritt. "Use average qp instead of last qp in the frame for 2pass rc. (Improves quality and rate accuracy if the first pass was cbr.)" r44 * x264: added --quiet and --no-psnr. r43 * eval.c: lalala ;) r42 * added Loren Merritt. r41 * all: added eval.c (I hope libx264.dsp is correct, I can't test). r40 * all: 2pass patch by Loren Merritt <lorenm AT u.washington DOT edu> "Mostly borrowed from libavcodec. There is not much theoretical basis behind my choice of defaults for rc_eq, qcompress, qblur, and ip_factor." r39 * all: first part of the 2pass patch by Loren Merritt (only the header/textures bits computed for now). r38 * all: include stdarg.h (needed for x264_log) r37 Use x264_log() in ratecontrol.c r36 * encoder/encoder.c: oops. (fixed compilation). r35 * all: more fprintf -> x264_log. r34 * all: added a x264_param_t.analyse.b_psnr r33 * encoder/encoder.c: kb/s with k=1000 (more consistant). Patch by Loren Merritt <lorenm AT u DOT washington DOT edu> r32 * all: introduced a x264_log function. It's not yet used everywhere but we should start using it :) r31 OS X is missing exp2f() r30 r29 Add my svn user name. r28 Bugfix. r27 Include timing info in VUI. Change frame rate from float to fraction (sorry for the inconvenience). r26 Add TAGS rule. r25 Fixes by Loren Merritt (lorenm at u.washington.edu). r24 Get rid of integer overflows that caused the rate control to go haywire in some situations. r23 * encoder: correct range for i_idr_pic_id is 0..65535 (Not 0..65534) r22 ratecontrol: patch by Loren Merritt <lorenm AT u DOT washington DOT edu> "The new cbr mode fails to completely disable itself when encoding in constant QP mode. The per-block QPs are then randomized between QP+4 and QP-2 based on uninitialized ratecontrol parameters." r21 * ratecontrol: patch by Måns Rullgård <mru AT mru DOT ath DOT cx> "This patch fixes a small bug (divide by 0 possible) in the rate control." r20 * encoder: simpler scene cut detection (seems better but do not check size anymore, so need more testing). r19 * all: Change the way PSNR is computed (based on a patch by Loren Merritt <lorenmn AT u DOT washington DOT edu> Using SQE(DeltaSourceReconstructed) = Sum( delta^2 ) PSNR( SQE, Size ) = -10Ln(SQE / 255^2 / Size )/Ln(10) ) Y+U+V : Union of YUV planes. Now there is - Mean PSNR : Sum( PSNR( SQE(Y/U/V), Size(Y/U/V) ) / TotalFrames - Average PSNR: Sum( PSNR( SQE(Y+U+V), Size(Y+U+V) ) ) / TotalFrames - Global PSNR: PSNR( Sum( SQE(Y+U+V) ), Size(Y+U+V)*TotalFrames ) Mean PSNR is used by the JM, and Average/Overall is used on Doom9 for example. r18 * x264.h: increased X264_BUILD. r17 * all: Patch from Måns Rullgård <mru AT mru DOT ath DOT cx> "Here's a patch that adds some kind of rate control.I suppose it is by no means perfect, but it's much better than constant quantizer.It also has a very crude scene change detection that sometimes avoids a buffer underflow by reencoding oversized P/B frames as I frames." r16 Linux PPC AltiVec fix r15 BeOS fixes (no stdint.h, no libm) r14 Attempt to fix build on Linux PPC r13 * encoder.c, analyse.c, macroblock: fixed when using a qp per MB. (Buggy for pskip and mb with null cbp luma and chroma). * dct*: fixed order of idct. r12 * cpu.asm: mmh trashing ebp,esi and edi isn't a good idea I fear ;) r11 * all: fixed ss2 runtime selection. r10 update & SSE2 support r9 update r8 remove some unused code r7 support for build checkasm.exe r6 * build fix (thx xxcd). r5 * TODO: test. r4 * vfw/* : oops... r3 * mc-c.c compilation fix for gcc >= 3.3 r2 re-import of the CVS.
|
|||||||||||||||
| Comments | Post comment | ||||||||||||||
|
|||||||||||||||
Brilliant software. I use this commandline tool as part of a conversion sequence to turn TV captures into smaller .mp4s for later playback on a "WDTV Live" box. The X264 commandline can be daunting to figure out initially (examples abound though, just search) but once you have a useful commandline then Bob's your Uncle. eg "my" commandline creates h264 video which is proven fully compatible with the WDTV Live in terms of the "video technical compliance stuff". Happy days. X264, when combined with FFMPEG to convert audio and with MP4box to mux the video/audio into an .mp4, provides you with capability to create your own (repeatable) custom tailored encodes. Fantastic.
|
|||||||||||||||
Extreme compression might be a very good feature for Sharing in-contra to my previous comment. Still figuring-out quality settings for personal back-up. Other ripping tools like Xvid4PSP, StaxRip, RipBot264, FairUse Wizard, MEGUI must be updated to this version accordingly.
|
|||||||||||||||
v r1703 better compression, but, video loses overall sharpness. it's disappointing. Target Video Bit rate : 1 500 Kbps Actual Video Bit rate : 817 Kpbs (Too Low than target results-in poor Quality) Hope for better improvement in next release.
|
|||||||||||||||
Simply the best implementation of H.264 spec. It is a CLI tool so some patience is required to learn it, otherwise use some great GUI's like Ripbot, StaxRip, or MeGUI.
|
|||||||||||||||
The BEST, and my most favorite video encoder. Thanks for continuous updates. Note:- ====== x264 vfw requires same trends for updates too!
|
|||||||||||||||
Simply The Best H264 encoder available, no doubt. Thanks to authors for keeping FREE, and running a good show of updates.
|
|||||||||||||||
By far the best H.264 encoder I've ever experienced. It even dwarfed all these commercial products and it's getting better!
|
|||||||||||||||
Unless you device does not support h264 part 10 (AVC) then there is NO REASON why you should not be using x264, even if you are not a console God , there is plenty of GUI's that harness the power of this codec implmentation.
|
|||||||||||||||
I have been an AVI with XviD and MP3 die hard fan for a LONG time! I just recently graduated to using MKV files and building my own chapters. THEN, I discovered that I can encode H264/X264 files and *directly* mux the AC3 audio from a DVD rip into an MKV. What I did NOT expect was the quality of video as such low bit rates. I very extensively use my Western Digital WD TV to play the videos I make on. When using XivD to encode 720 videos (1280X720), I *must* run at least 5000kbs to have a decent picture. With H264 (or better yet, x264.exe) the video quality is superior, at only 2500kbs!!! Now I wish my Creative Zen would support 264, because it is leaps and bounds better than WMV9!! I love this CLI tool, thanks so much to the author(s)!!! I hope to have a GUI built soon, and have plans on making a GUI tool kit for MKVs! Thanks!
|
|||||||||||||||
With this codec,you have DVD-like picture quality on VCD bitrates! IMO, the future is here, in this codec! I use it with super encoder to batch convert dvb mpeg2 files of various framesizes. The speed is 1/4 realtime on my core 2 duo 6600. The vfm version is faster (about 2/5 realtime). You can use it with virtualdub. An excellent choice for those who like cutting edge solutions, or something with great future in front of it!
|
|||||||||||||||
X264 is the best codec I ever used. Thanks to DeathToSheep for the unofficial VFW version I can stay using it with virtualdub. I capture with Mainconcept PVR in MPEG2 (quality 32) and convert with VirtualdubMPG to AVI files (X264 -single pass bitrate 800) With this combination of videotools I can put 13 episodes (50minutes/episode) of my favorite "Aspe murders" soap on 1 DVD and the quality is much,much better than VHS.
|
|||||||||||||||
After being skeptical about AVC H.264 I finally broke down and decided to try it for some iPod movies. The source files were MPEG-1 @ 1856kbps ripped from some high bitrate xVCD's I did years ago, I tried doing these with 3ivX and DivX 6.2.5 and wasn't pleased with the results especially on HDTV, I tried x264 using MeGUI and I am blown away by the 2-pass quality @ 700kbps. Even at a low resolution of 352x144 these movies look good (not great)on a 42" HDTV and the file sizes are quite small. MeGUI is a very powerful program but not exactly for noobs, When the ability for 640x480 iPod resolutions becomes possible this codec will be unstoppable!!
|
|||||||||||||||
What's xvid?? what's divx?? what's wmv?? No way X264 the best codec ever. High quality in low bitrate. This awesome codec use for me for months.No rival. Cauptain
|
|||||||||||||||
Better quality than XVID and a smaller file size. Use the latest FFDshow from Celtic_Druid for playback. Be aware that playback is CPU intensive - Not designed for < 2.0Ghz machines (yet).
|
|||||||||||||||
works now correctly in sony vegas - testing quality against other H264, but so far - this is a winner ..
|
|||||||||||||||
|
|||||||||||||||
| 1 tool hits, Showing 1 to 1 tools |
-
|
Explanation:
New tool= New tool since your last visit
New version= New version since your last visit
New comments= New comments since your last visit
Type and download
Freeware = Free software.
Free software = Free software and also open source code.
Freeware/Adware = Free software but supported by advertising, usually with a included browser toolbar. It may be disabled when installing or after installation.
Trialware = Also called shareware or demo. Trial version available for download and testing with usually a time limit or limited functions.
Payware = No demo or trial available.
v1.0.1 = Latest version available.
Download beta = It could be a BETA, RC(Release Candidate) and even a ALPHA version of the software.
Download (direct link) = A direct link to the software download.
Download (developer's site) = A link to the software developer site.
Download (mirror link) = A mirror link to the software download. It may not contain the latest versions.
Download old versions = Free downloads of previous versions of the program.
Download 64 bit version = If you have a 64bit operating system you can download this version.
Download portable version = No installation is required, just extract the files to a folder and run directly.
= Windows version available.
= Mac OS version available.
= Linux version available.
Our hosted tools are virus and malware scanned with several antivirus programs using www.virustotal.com and virusscan.jotti.org.
Rating
Rating from 0-10.
Browse software by sections
