Alright! So now we’ve got the reference encoder and a few options to playback our files – let’s jump into making some newfangled video files! This post will give you all the information you need to set up a simple work flow for creating HEVC files.
All of the tools we’ll be using for this demonstration are command line only so I would recommend you set up a work folder to make execution easy. For myself, I set up my work folder as C:/hevc/ – no fuss, no muss. All of the tools we use will be stored here as well as input/output files during our encodes.
The first tool we’ll need is the ubiquitous ffmpeg. We’ll be using ffmpeg to convert source material into raw .yuv video to feed to TAppEncoder.
The second tool is the aforementioned TAppEncoder – the HM10.1 reference encoder.
Finally, we’ll need mp4box if we want to mux our HEVC streams into mp4 files.
For good measure we’ll also pick up a copy of VirtualDubMod – or you can use vanilla if you wish. We’ll use this if we want to make frame-accurate cuts to our source file in order to segment the workload.
Now that we have all of our tools we need to create a config file for TAppEncoder to use. The very first config file I personally tested came from this blog (which also tells you how to build your own TAppEncoder, if you’re so inclined) and I believe it matches the ‘encoder_lowdelay_main.cfg’ provided with the reference encoder. Create a new text document in your work directory and rename it something simple – I chose test.cfg. Then paste one of the pre-made config files and save it.
#======== File I/O =====================
BitstreamFile : mobile.hevc
ReconFile : mobile_out.yuv
FrameRate : 24 # Frame Rate per second
FrameSkip : 0 # Number of frames to be skipped in input
SourceWidth : 352 # Input frame width
SourceHeight : 288 # Input frame height
FramesToBeEncoded : 10 # Number of frames to be coded
#======== Unit definition ================
MaxCUWidth : 64 # Maximum coding unit width in pixel
MaxCUHeight : 64 # Maximum coding unit height in pixel
MaxPartitionDepth : 4 # Maximum coding unit depth
QuadtreeTULog2MaxSize : 5 # Log2 of maximum transform size for
# quadtree-based TU coding (2…6)
QuadtreeTULog2MinSize : 2 # Log2 of minimum transform size for
# quadtree-based TU coding (2…6)
QuadtreeTUMaxDepthInter : 3
QuadtreeTUMaxDepthIntra : 3
#======== Coding Structure =============
IntraPeriod : -1 # Period of I-Frame ( -1 = only first)
DecodingRefreshType : 0 # Random Accesss 0:none, 1:CDR, 2:IDR
GOPSize : 4 # GOP Size (number of B slice = GOPSize-1)
# Type POC QPoffset QPfactor tcOffsetDiv2 betaOffsetDiv2 temporal_id #ref_pics_active #ref_pics reference pictures predict deltaRPS #ref_idcs reference idcs
Frame1: B 1 3 0.4624 0 0 0 4 4 -1 -5 -9 -13 0
Frame2: B 2 2 0.4624 0 0 0 4 4 -1 -2 -6 -10 1 -1 5 1 1 1 0 1
Frame3: B 3 3 0.4624 0 0 0 4 4 -1 -3 -7 -11 1 -1 5 0 1 1 1 1
Frame4: B 4 1 0.578 0 0 0 4 4 -1 -4 -8 -12 1 -1 5 0 1 1 1 1
ListCombination : 1 # Use combined list for uni-prediction in B-slices
#=========== Motion Search =============
FastSearch : 1 # 0:Full search 1:TZ search
SearchRange : 64 # (0: Search range is a Full frame)
BipredSearchRange : 4 # Search range for bi-prediction refinement
HadamardME : 1 # Use of hadamard measure for fractional ME
FEN : 1 # Fast encoder decision
FDM : 1 # Fast Decision for Merge RD cost
#======== Quantization =============
QP : 32 # Quantization parameter(0-51)
MaxDeltaQP : 0 # CU-based multi-QP optimization
MaxCuDQPDepth : 0 # Max depth of a minimum CuDQP for sub-LCU-level delta QP
DeltaQpRD : 0 # Slice-based multi-QP optimization
RDOQ : 1 # RDOQ
RDOQTS : 1 # RDOQ for transform skip
#=========== Deblock Filter ============
DeblockingFilterControlPresent: 0 # Dbl control params present (0=not present, 1=present)
LoopFilterOffsetInPPS : 0 # Dbl params: 0=varying params in SliceHeader, param = base_param + GOP_offset_param; 1=constant params in PPS, param = base_param)
LoopFilterDisable : 0 # Disable deblocking filter (0=Filter, 1=No Filter)
LoopFilterBetaOffset_div2 : 0 # base_param: -13 ~ 13
LoopFilterTcOffset_div2 : 0 # base_param: -13 ~ 13
#=========== Misc. ============
InternalBitDepth : 8 # codec operating bit-depth
#=========== Coding Tools =================
SAO : 1 # Sample adaptive offset (0: OFF, 1: ON)
AMP : 1 # Asymmetric motion partitions (0: OFF, 1: ON)
TransformSkip : 1 # Transform skipping (0: OFF, 1: ON)
TransformSkipFast : 1 # Fast Transform skipping (0: OFF, 1: ON)
SAOLcuBoundary : 0 # SAOLcuBoundary using non-deblocked pixels (0: OFF, 1: ON)
#============ Slices ================
SliceMode : 0 # 0: Disable all slice options.
# 1: Enforce maximum number of LCU in an slice,
# 2: Enforce maximum number of bytes in an ‘slice’
# 3: Enforce maximum number of tiles in a slice
SliceArgument : 1500 # Argument for ‘SliceMode’.
# If SliceMode==1 it represents max. SliceGranularity-sized blocks per slice.
# If SliceMode==2 it represents max. bytes per slice.
# If SliceMode==3 it represents max. tiles per slice.
LFCrossSliceBoundaryFlag : 1 # In-loop filtering, including ALF and DB, is across or not across slice boundary.
# 0:not across, 1: across
#============ PCM ================
PCMEnabledFlag : 0 # 0: No PCM mode
PCMLog2MaxSize : 5 # Log2 of maximum PCM block size.
PCMLog2MinSize : 3 # Log2 of minimum PCM block size.
PCMInputBitDepthFlag : 1 # 0: PCM bit-depth is internal bit-depth. 1: PCM bit-depth is input bit-depth.
PCMFilterDisableFlag : 0 # 0: Enable loop filtering on I_PCM samples. 1: Disable loop filtering on I_PCM samples.
#============ Tiles ================
UniformSpacingIdc : 0 # 0: the column boundaries are indicated by ColumnWidth array, the row boundaries are indicated by RowHeight array
# 1: the column and row boundaries are distributed uniformly
NumTileColumnsMinus1 : 0 # Number of columns in a picture minus 1
ColumnWidthArray : 2 3 # Array containing ColumnWidth values in units of LCU (from left to right in picture)
NumTileRowsMinus1 : 0 # Number of rows in a picture minus 1
RowHeightArray : 2 # Array containing RowHeight values in units of LCU (from top to bottom in picture)
LFCrossTileBoundaryFlag : 1 # In-loop filtering is across or not across tile boundary.
# 0:not across, 1: across
#============ WaveFront ================
WaveFrontSynchro : 0 # 0: No WaveFront synchronisation (WaveFrontSubstreams must be 1 in this case).
# >0: WaveFront synchronises with the LCU above and to the right by this many LCUs.
#=========== Quantization Matrix =================
ScalingList : 0 # ScalingList 0 : off, 1 : default, 2 : file read
ScalingListFile : scaling_list.txt # Scaling List file name. If file is not exist, use Default Matrix.
#============ Lossless ================
TransquantBypassEnableFlag: 0 # Value of PPS flag.
CUTransquantBypassFlagValue: 0 # Constant lossless-value signaling per CU, if TransquantBypassEnableFlag is 1.
#============ Rate Control ======================
RateControl : 0 # Rate control: enable rate control
TargetBitrate : 1000000 # Rate control: target bitrate, in bps
KeepHierarchicalBit : 1 # Rate control: keep hierarchical bit allocation in rate control algorithm
LCULevelRateControl : 1 # Rate control: 1: LCU level RC; 0: picture level RC
RCLCUSeparateModel : 1 # Rate control: use LCU level separate R-lambda model
InitialQP : 0 # Rate control: initial QP
RCForceIntraQP : 0 # Rate control: force intra QP to be equal to initial QP
### DO NOT ADD ANYTHING BELOW THIS LINE ###
### DO NOT DELETE THE EMPTY LINE BELOW ###
Your work folder should now look something like this:
Next we need to get some content to encode. For this test I’ve downloaded the ipod version of Big Buck Bunny. Here I’m using a smaller resolution because the encoder is very slow and this is really just to test encoding and to make sure we have everything set up correctly.
Copy your source file (BigBuckBunny_640x360.m4v in my case) to your work folder. Rename it something easy to type like bbb.m4v. Because this is a test we don’t want to process the entire video file which is over 14000 frames. First we’ll process the file into a raw avi file so we can cut it into pieces accurately. To create a raw .avi file run a command prompt and type the following:
ffmpeg -i bbb.m4v -pix_fmt yuv420p -vcodec rawvideo bbb.avi
Open the resulting file in VirtualDubMod and cut out a segment or 300 or so frames. Go ahead and disable the audio stream, set processing to ‘direct stream copy’ and save out the file with a simple filename. I used ‘bbb_test.avi’. Next we need to strip the avi header information so we have just a raw .yuv file. We do that in much the same way we created our .avi file:
ffmpeg -i bbb_test.avi -pix_fmt yuv420p bbb_test.yuv
This gives us a working .yuv file which we feed to TAppEncoder along with our config file. But first we need to update our config file to reflect our input. Open test.cfg and change the following:
BitstreamFile : bbb_test.hevc # the output file
ReconFile : z1.yuv # a yuv output file – I’m not sure what its use is but I always name it to be at the bottom of my folder so I can find and delete it easily.
FrameRate : 24 # should match the source framerate
SourceWidth : 640 # Input frame width
SourceHeight : 360 # Input frame height
FramesToBeEncoded : 912 # Number of frames to be coded – should match the number of frames of your source – this is not done automatically!
There are plenty of other settings we could change – many of which will have a large impact on quality – but for now we’ll leave those settings alone. Once you’ve updated everything be sure to save the config file and we’re ready to run TAppEncoder:
tappencoder -i bbb_test.yuv -c test.cfg
And now we wait! The HM10.1 reference encoder is single threaded only so you can use your computer in the meantime.
Once the file is encoded you can play it back with one of the tools listed in the previous post. If you use the Lentoid HEVC decoder just rename ‘bbb_test.hevc’ to ‘bbb_test.hm10’. If you’d like to watch in with the Osmo4 player then you’ll need to mux the .hevc file ino an .mp4 file by running:
mp4box -add bbb_test.hevc:fps=24 bbb_test.mp4
Congratulations! You’ve encoded your first HEVC video!
Next time we’ll look at settings to increase video quality (this encode was done at QP 32) and workflow optimizations that will allow us to speed up the encoding process by using faux multi-threading.
You can download the output files here: