The Secret World – on sale for $9.99

tswboxThe Secret World is on sale at Amazon as a digital copy for $9.99 right now.  I’m not sure if this is just a 4th of July sale or what but it’s a good time to pick it up!  Patch 1.07 is sitting in the wings right now too so there will be new content to play around with soon as we prepare to head to Tokyo and punch the Orochi Group in the face.

X360ce – An XBox 360 Controller Emulator for Windows

So after buying Dungeons & Dragons: Chronicles of Mystara I found that it wouldn’t support my $20.00 generic PS3 controller by default.  In fact most games nowadays don’t seem to support it.  It seems that only Xbox 360 controllers are supported for many newer games.  I don’t play many games where I feel that’s a problem because  I prefer Keyboard and Mouse in the vast majority of cases.  But for this game…I desperately needed a gamepad.

And that’s how I found X360ce.

x360ce

It’s an emulator that maps regular DirectInput calls to XInput calls – essentially letting you use the older, more widely supported input format that virtually all usb gamepads will map to.  And it works like a charm.

All you do is download the application, put it in the game executable folder, and run the executable once to create a controller map.

I’d encourage anyone without an Xbox 360 controller to give it  a whirl!

http://code.google.com/p/x360ce/

Dungeons & Dragons: Chronicles of Mystara

For those of you who grew up in the 90’s you may remember the wonderful side scrolling beat-em-up games Capcom released using the D&D license: Dungeons & Dragons: The Tower of Doom and Shadows Over Mystara.  I remember my favorite part of going to the boardwalk at Ocean City as a kid was hitting up the arcades and dropping loads of quarters into Tower of Doom with my brother.

Well they’ve been re-released and are available on Steam as Dungeons & Dragonsn: Chronicles of Mystara!  With multiplayer!  And new game modes!

229480_screenshots_2013-06-18_00001

The price is steep… but I’ve been wanting these games to be re-released with a multiplayer component ever since college so I couldn’t say no.  Seriously, I had plans to do an Unreal Tournament 2000 mod to recreate Tower of Doom back in freshmen year…which pretty much ended with me realizing that creating that much artwork was way beyond me.  So yeah, I’ve wanted this for a long time.

Image

Still one of the best feelings ever…

 

AMD Radeon 7970 for less than $300.00 on Newegg

My trusty old Radeon 6970 has been getting a bit long in the tooth lately – making some fussy noises and running hotter and hotter with no real change in my ambient temps – so today I went looking for something new.  Low and behold I found this incredible deal on Newegg today:

neweggradeon7970

That’s a NEW (not refurbished) Radeon 7970 for $300.00 after rebate – with 4 AAA titles that I don’t happen to own at the moment.  Tomb Raider, Crysis 3, Bioshock Infinite, and FarCry 3…. I wonder if I’ll even have time to play all of them this summer!

It’s one heck of a deal, and I jumped on it.

 

The Secret World – still the best leveling experience in MMO’s

It’s been a while since I last played The Secret World – I had finished the main story mission previously but didn’t take the time to start grinding end-game nightmare dungeons.  TSW went free to play a while ago so I decided to go back and play through the story missions again and see the raid content that is available.  As of now I haven’t gotten through all of the elite dungeons to unlock the gatekeeper encounter and I haven’t finished the main story mission for a second time but I have to say the story is sucking me in just as much as it did the first time and it’s just excellent.  Sure there are bugs and some missions will make you want to pull your hair out (Mainframe…grrrr) but the mystery and lore makes it all worthwhile.  For instance, did you know the entire story mission for Egypt which focuses upon the struggle against the Dark Pharaoh Akhenaten and the religious cult of Aten was based upon a historical figure? http://en.wikipedia.org/wiki/Akhenaten Pretty cool!

If you’re someone who hasn’t taken the plunge yet I’d say it’s well worth the $30.00 price of admission and you can still get most of the new content for free with the Funcom points you’re rewarded – Last Train to Cairo has been my favorite content patch so far.  It’s on sale right now on Steam for 50% off, too!

HM11.0 reference encoder released

A new version of the reference encoder has been released!  I’ve uploaded a copy here: http://www.mediafire.com/?amk4ba5fh3apph9 .  It was once again compiled by JEEB and released on the Doom9 forums: http://forum.doom9.org/showthread.php?p=1632870#post1632870.

From his post:

The biggest change now is that the configuration files actually contain the profile and level. Before this, unless you actually remembered to add those two, your streams would be invalid.

So I guess the streams we’ve made up until now weren’t technically valid – oops! 🙂  Otherwise it says performance should be largely the same but it does say some changes have been made to RateControl and there have been other bugfixes so it sounds like an update is warranted.

Originally Posted by jct-vc
Compared to the release candidate we decided to revert a patch that caused problems with conformance test bitstreams. We also made a change to only warn when profile and level are not set instead of failing.Compared to HM 10.1, HM 11.0 contains changes for rate control and a number of bug fixes. Performance in the common test conditions is not changed. We will still provide updated anchors with valid profile/level values within the next days.Please note, that there are still quite a few open issues in the bug tracker. Most of them are related to high level issues like parameter set handling and reference picture sets.

Any help with fixing these issues and reviewing patches, especially regarding conformance issues, are highly appreciated.

For details see:

https://hevc.hhi.fraunhofer.de/trac/hevc/report/16

HEVC – Creating a custom GOP

Now that we’ve got a workable encoding process set up it’s time to start tinkering with how HEVC stores frames.  In order to do this we’re going to start modifying the config file where we declare the GOP structure.

Before we do so you should download the HM10.1 Reference Manual if you haven’t done so already.  I’ve found that while there isn’t much information about the workings of TAppEncoder on the web just yet you can answer most questions you would have by looking through this document.

Frame_POC_ex

Here is an example of a GOP that we’ve been using all along:

GOPSize : 4 # GOP Size (number of B slice = GOPSize-1)
# Type POC QPoffset QPfactor tcOffsetDiv2 betaOffsetDiv2 temporal_id #ref_pics_active #ref_pics reference pictures predict deltaRPS #ref_idcs reference idcs
Frame1: B 1 3 0.4624 0 0 0 4 4 -1 -5 -9 -13 0
Frame2: B 2 2 0.4624 0 0 0 4 4 -1 -2 -6 -10 1 -1 5 1 1 1 0 1
Frame3: B 3 3 0.4624 0 0 0 4 4 -1 -3 -7 -11 1 -1 5 0 1 1 1 1
Frame4: B 4 1 0.578 0 0 0 4 4 -1 -4 -8 -12 1 -1 5 0 1 1 1 1

We will be using this GOP to understand how GOPs are constructed and from there I’ll share my own first attempts at creating custom GOPs.

First let’s look at all the parts of each frame.

#‘ – The frame number – Frames are listed in decoding order (not display order)

Type – Specifies the frame type – can be I, P, or B.

1) I slice: A slice in which all CUs of the slice are coded using only intrapicture prediction.
2) P slice:In addition to the coding types of an I slice,
some CUs of a P slice can also be coded using interpicture prediction with at most one motion-compensated prediction signal per PB (i.e., uniprediction). P slices only use reference picture list 0.
3) B slice:In addition to the coding types available in a P slice, some CUs of the B slice can also be coded using interpicture prediction with at most two motioncompensated prediction signals per PB (i.e., biprediction). B slices use both reference picture list 0 and list 1

POC – Picture Order Count – The display order of the frame.

Note the display order (POC) and the decode order (Frame #) may be different.

QPoffset – QP offset is added to the QP parameter to set the final QP value to use for this frame.  If encoding at constant QP 18, QPoffset 2 would code the frame at QP20.

QPfactor – Weight used during rate distortion optimization. Higher values mean lower quality and less bits. Typical range is between 0.3 and 1.

tcOffsetDiv2 – In-loop deblocking filter parameter tcOffsetDiv2 is added to the base parameter LoopFilterTcOffset div2 and must result in an integer ranging from -6 to 6.*

betaOffsetDiv2 – In-loop deblocking filter parameter betaOffsetDiv2 is added to the base parameter LoopFilterBetaOffset div2 and must result in an integer ranging from -6 to 6.*

*presumably these two options are for setting per-frame deblocking strength.  I have not personally tested these options yet.

temporal_id – Temporal layer of the frame. A frame cannot predict from a frame with a higher temporal id. If a frame with higher temporal IDs is listed among a frame’s reference pictures, it is not used, but is kept for possible use in future frames.  I haven’t found any use for this option yet.

num_ref_pics_active – Size of reference picture lists L0 and L1, indicating how many reference pictures in each direction are used during coding.

num_ref_pics – The number of reference pictures kept for this frame. This includes pictures that are used for reference for the current picture as well as pictures that will be used for reference in the future.

reference_pictures – A space-separated list of integers, specifying the POC of the reference pictures kept, relative the POC of the current frame. The picture list shall be ordered, first with negative numbers from largest to smallest, followed by positive numbers from smallest to largest (e.g. -1 -3 -5 1 3).

predict – accepts values of 0, 1, or 2

0 - indicates that the reference picture set is encoded without inter RPS prediction and the subsequent parameters deltaRIdx 1, deltaRPS, num ref idcs and Reference idcs are ignored and do not need to be present.  Note that although this frame is encoded without inter-prediction the reference_pictures will still be available to other frames.

1 - the reference picture set is encoded with inter prediction RPS using the subsequent parameters deltaRIdx 1, deltaRPS, num ref  dcs and Reference idcs in the line.

2 - the reference picture set is encoded with inter RPS but only the deltaRIdx 1 parameters is needed. The deltaRPS, num ref idcs and Reference idcs values are automatically derived by the encoder based on the POC and refPic values of the current line and the RPS pointed to by the deltaRIdx 1 parameters.

deltaRIdx1 – The difference between the index of the curent RPS and the predictor RPS minus 1.

deltaRPS – The difference between the POC of the predictor Frame and POC the current Frame.

num ref idcs – The number of ref idcs to encode for the current Frame. The value is equal to the value of num ref pics of the predictor Frame plus 1.

reference idcs – A space-separated list of integers, specifying the ref idcs of the inter RPS prediction. The value of ref idcs may be 0, 1 or 2 indicating that the reference picture is a reference picture used by the current picture, a reference picture used for future picture or not a reference picture anymore, respectively. The first num ref pics of ref idcs correspond to the Reference pictures in the predictor RPS. The last ref idcs corresponds to the predictor picture

Whew, that’s a lot of verbage and a lot of it may seem unclear, but as we look at a real world example it should make more sense.

So in the example we’re looking at let’s look at Frame1:

Frame1: B 1 3 0.4624 0 0 0 4 4 -1 -5 -9 -13 0

Frame 1 is the first DECODED picture.  It is specified as a B-frame.  It’s POC is 1 so it is also the first DISPLAYED frame.  It has a QPoffset of 3 and a QPfactor of 0.4624.   It does not modify in-loop deblocking strength. It has 4 active reference pictures and makes use of 4 reference pictures.  Those reference pictures are defined as -1, -5, -9, and -13.  Predict is set to 0 so it does not need any further information to define its temporal dependencies.

Most of that is straightforward but what we want to look at is the reference pictures.  Because Frame1 is POC 1 the frames that it will reference have values of:

1 -1 = 0
1- 5 = -4
1- 9 = -8
1 – 13 = -12

So in a series of pictures if we were on type Frame1 at POC 25 it would reference frames 24, 20, 16, and 12.

Then for Frame 2:

Frame2: B 2 2 0.4624 0 0 0 4 4 -1 -2 -6 -10 1 -1 5 1 1 1 0 1

Reference pictures for this frame will be:

2- 1 = 1
2- 2 = 0
2 – 6 = -4
2- 10 = -8

So in a series of pictures if we were on type Frame2 at POC 26 it would reference frames 25, 24, 20, and 16.

We see Frame3 and Frame4 following a similar pattern referencing the POC immediately beforehand and also the previous POCs of type Frame1 for a total of four reference pictures.

lowdelay_main

If that’s all there was to creating GOPs then it would be easy.  However, you’re limited in which pictures you can choose as reference pictures in a given frame by what frames are available to the predictor frame.  The frames which are available must be defined in the reference_idcs.  So going back to Frame 2:

Frame2: B 2 2 0.4624 0 0 0 4 4 -1 -2 -6 -10 1 -1 5 1 1 1 0 1

deltaRPS is listed as -1.  This defines the predictor frame.  The predictor frame doesn’t gain any special significance as a reference (or even need to be used as a reference) but what it does do is define what reference pictures are available for the current Frame.  deltaRPS is equal to the predictor frame POC minus the current frame POC – so in this case the predictor frame (Frame1) has POC 1 and the current frame (Frame2) has POC 2:

1 - 2 = -1

From there we need to define the num_ref_idcs which in this case is 5.  The num_ref_idcs equals the num_ref_pics of the predictor frame plus 1.  All our Frames use 4 reference pictures so the number of reference idcs is 5.

Then we need to declare which idcs (corresponding to reference pictures in the predictor) will be used for the frame and thus carried forward for use in further frames.  In this case the frame uses reference idcs of: 1 1 1 0 1

In order to determine what these numbers should be we start by looking at the reference pictures of the predictor frame:

Frame1 is POC 1.  Frame1’s first reference frame is -1.

1 -1 = 0.

Then we look at Frame 2:

Frame2 is POC 2.  Frame2 has reference frame -2.

2 – 2 = 0.

Because both frames reference the picture at 0 the FIRST reference_idcs is listed as a ‘1’ – that is to say the reference picture is used AND it will be available for other frames to reference if they use Frame2 as a predictor.

We must go through each reference picture for the predictor frame (Frame1) and make this determination.

Frame1 reference picture 2 is -5.  1 -5 = -4.
Frame2 has reference picture -6.  2 -6 = -4
The second reference idcs is 1.

Frame1 reference picture 3 is -9.  1 -9 = -8.
Frame 2 has reference picture -10. 2 – 10 = -8
The third reference idcs is 1.

Frame1 reference picture 4 is -13. 1 – 13 = -12.
Frame 2 has no reference picture at -12.
The fourth reference idcs is 0.

Finally, we evaluate whether the predictor frame will also be kept as a reference idcs.

Frame 1 is at POC 1.
Frame 2 has reference picture -1. 2 – 1 = 1.
The fifth reference idcs is 1.

1 1 1 0 1 – easy!  Just remember that the order of the reference idcs MUST correspond as if you’re evaluating the reference pictures of the predictor frame from left to right, followed by evaluating whether the predictor frame itself is a reference picture.  Also note that our current Frame has 4 reference frames, thus it should have 4 reference_idcs listed as ‘1’.  If not then the encoder will throw errors and probably crash.

With this, we can go about making our own GOP. Because in my previous tests the simple low_delay config seemed very effective I’d like to update it to provide more reference pictures.  In the current low_delay setup there are 4 reference pictures and the furthest temporal difference is in Frame1 at -13.   So I want to preserve seeking to that frame, but increase the reference pictures to include those which are excluded by the current setup.

GOPsize must always be a multiple of 2, so we’ll create a GOP with 14 frames.  The number of reference pictures will likewise be 14.  Note: in my testing I found that setting num_ref_pics above 15 resulted in an encoder crash. We will otherwise use a similar setup to low_delay with the first frame being ‘predict 0’ and slight variations in QPoffset to maintain a consistent quality.  Because every frame in the GOP references all 14 frames preceding it QPfactor should be a non-issue so we’ll set that as the same number for each frame.

Here’s what I came up with:

Ref 14 GOP 14

# Type POC QPoffset QPfactor tcOffsetDiv2 betaOffsetDiv2 temporal_id #ref_pics_active #ref_pics reference pictures predict deltaRPS #ref_idcs reference idcs
Frame1: B 1 2 0.5 0 0 0 14 14 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 -13 -14 0
Frame2: B 2 3 0.5 0 0 0 14 14 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 -13 -14 1 -1 15 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1
Frame3: B 3 2 0.5 0 0 0 14 14 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 -13 -14 1 -1 15 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1
Frame4: B 4 3 0.5 0 0 0 14 14 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 -13 -14 1 -1 15 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1
Frame5: B 5 1 0.5 0 0 0 14 14 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 -13 -14 1 -1 15 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1
Frame6: B 6 2 0.5 0 0 0 14 14 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 -13 -14 1 -1 15 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1
Frame7: B 7 3 0.5 0 0 0 14 14 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 -13 -14 1 -1 15 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1
Frame8: B 8 2 0.5 0 0 0 14 14 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 -13 -14 1 -1 15 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1
Frame9: B 9 3 0.5 0 0 0 14 14 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 -13 -14 1 -1 15 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1
Frame10: B 10 1 0.5 0 0 0 14 14 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 -13 -14 1 -1 15 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1
Frame11: B 11 2 0.5 0 0 0 14 14 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 -13 -14 1 -1 15 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1
Frame12: B 12 3 0.5 0 0 0 14 14 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 -13 -14 1 -1 15 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1
Frame13: B 13 2 0.5 0 0 0 14 14 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 -13 -14 1 -1 15 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1
Frame14: B 14 1 0.5 0 0 0 14 14 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 -13 -14 1 -1 15 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1

I did some tests with this and it ultimately gives a much lower bitrate (8-12%) but also a slightly lower PSNR than the low_delay config did.  I was surprised because I thought the PSNR would be higher as well.  Maybe QPfactor is having more impact than I thought?  That’s something I’ll test out in the future.

But until then, the main thing that we now understand is how to set up our reference_idcs.  The above example still uses a very basic linear frame setup but you could also set up some bi-directional coding similar to the Random_Access.config included with the TAppEncoder download I’ve linked here.

For those who are interested I’ve also made a custom config similar to Random_Access – or at least one that uses out-of-order POC.  I made this just to see if it would work and the results are poor as far as bitrate/PSNR go so I wouldn’t recommend using it for anything:

Ref 12 GOP 12

Frame1: B 1 1 0.33 0 0 0 12 12 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 0
Frame2: B 2 2 0.44 0 0 0 12 12 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 1 -1 13 1 1 1 1 1 1 1 1 1 1 1 0 1
Frame3: B 3 3 0.55 0 0 0 12 12 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 1 -1 13 1 1 1 1 1 1 1 1 1 1 1 0 1
Frame4: B 4 2 0.44 0 0 0 12 12 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 1 -1 13 1 1 1 1 1 1 1 1 1 1 1 0 1
Frame5: B 12 1 0.33 0 0 0 12 12 -8 -9 -10 -11 -12 -13 -14 -15 -16 -17 -18 -19 1 -8 13 1 1 1 1 1 1 1 1 1 1 1 0 1
Frame6: B 11 2 0.44 0 0 0 12 12 -7 -8 -9 -10 -11 -12 -13 -14 -15 -16 -17 1 1 1 13 1 1 1 1 1 1 1 1 1 1 1 0 1
Frame7: B 10 3 0.55 0 0 0 12 12 -6 -7 -8 -9 -10 -11 -12 -13 -14 -15 1 2 1 1 13 1 1 1 1 1 1 1 1 1 1 0 1 1
Frame8: B 9 2 0.44 0 0 0 12 12 -5 -6 -7 -8 -9 -10 -11 -12 -13 1 2 3 1 1 13 1 1 1 1 1 1 1 1 1 0 1 1 1
Frame9: B 8 1 0.33 0 0 0 12 12 -4 -5 -6 -7 -8 -9 -10 -11 1 2 3 4 1 1 13 1 1 1 1 1 1 1 1 0 1 1 1 1
Frame10: B 7 2 0.44 0 0 0 12 12 -3 -4 -5 -6 -7 -8 -9 1 2 3 4 5 1 1 13 1 1 1 1 1 1 1 0 1 1 1 1 1
Frame11: B 6 3 0.55 0 0 0 12 12 -2 -3 -4 -5 -6 -7 1 2 3 4 5 6 1 1 13 1 1 1 1 1 1 0 1 1 1 1 1 1
Frame12: B 5 2 0.33 0 0 0 12 12 -1 -2 -3 -4 -5 1 2 3 4 5 6 7 1 1 13 1 1 1 1 1 0 1 1 1 1 1 1 1

Good luck making your own custom GOPs!

HEVC – GOPs, seeking issues, multi-threading, and a kludgey solution

Alright, now we’ve done our fair share of encoding with TAppEncoder and should have a good grasp of the basics.  We’ve tinkered a bit, but haven’t really found the super settings that are the holy grail of hobbyist video encoding.  What we have found, though, is a recurring problem in our test files – seeking is atrociously slow with all three of the tested decoders I have available.  I would assume it’s equally slow with libav smarter fork because that’s what Osmo4’s decoder is based off of.  So, before we go further with tinkering we first need to figure out what’s causing this problem and a means to fix it – because any hevc encode that you can’t skip around in is as good as useless.

To understand what’s causing the slow seeking we first have to look at and understand how the declared GOPs work in an hevc stream.  Up until now we’ve just used pre-made configuration files that have this part set up already.  We’ve changed some marginal parts – such as the QPoffset – but haven’t dealt with the reference pictures nor the ref idcs.

#       Type POC QPoffset QPfactor tcOffsetDiv2 betaOffsetDiv2  temporal_id #ref_pics_active #ref_pics reference pictures predict deltaRPS #ref_idcs reference idcs 

Frame1:  B    1   3        0.4624   0            0               0           4                4         -1 -5 -9 -13       0

Frame2:  B    2   2        0.4624   0            0               0           4                4         -1 -2 -6 -10       1      -1       5         1 1 1 0 1

Frame3:  B    3   3        0.4624   0            0               0           4                4         -1 -3 -7 -11       1      -1       5         0 1 1 1 1            

Frame4:  B    4   1        0.578    0            0               0           4                4         -1 -4 -8 -12       1      -1       5         0 1 1 1 1

In the above list from the low_delay .config the GOP is set up as having 4 frames.  Each frame has 4 reference pictures – both ref_pics and ref_pics_active are set to 4.  Following that is a list of reference frame coordinates.  The reference frame POC is equal to the current frame’s POC plus the listed value.

Frame 1 is POC 1.  Frame1 lists reference frames of -1, -5, -9, and -13.  The GOP structure definition only lists POC 1 -4 to represent a single GOP – throughout the hevc bitstream the POC number is incremented for each GOP.  So GOP 1 contains POC 1-4, GOP2 contains 5-8, and so on and so forth.  In a video with 1000 frames the file would contain POC’s 1 – 1000.  Every fourth frame will be of the type Frame1, so frames 1, 5, 9, 13, 17, 21, and so on would all be of type Frame1.  Frame 21 would thus have reference frames at:
21 – 1 = POC 20
21 – 5 = POC 16
21 – 9 = POC 12
21 – 13 = POC 8

Easy, right?  Our low delay config thus has a reference structure looking something like this for its first 16 frames:lowdelay_mainNotice anything here?  Every frame has a reference of -1.  Thus to decode frame 1000 of a 1000 frame video file we need to have the data for frame 999 in memory.  To decode frame 999 we have to have the data for frame 998 in memory.  And so on and so forth.  This is why seeking becomes slower in our test encodes as we move further from the i-frame.  If you seek to frame 100 then the decoder has to chug through 100 frames to start playing again.  If you seek to frame 1000 then it has 10 times as much work to do.  In a normal 25 minute long TV episode you’re looking at over 30,000 frames – far too much to decode for even the fastest computers today.

But shouldn’t inserting i-frames fix this?  It does!  But only if you specify a closed GOP – which the configuration files we first tested did not.  From this reference pdf we read:

A. Random Access and Bitstream Splicing Features The new design supports special features to enable random access and bitstream splicing. In H.264/MPEG-4 AVC, a bitstream must always start with an IDR access unit. An IDR access unit contains an independently coded picture—  i.e., a coded picture that can be decoded without decoding  any previous pictures in the NAL unit stream. The presence  of an IDR access unit indicates that no subsequent picture  in the bitstream will require reference to pictures prior to the  picture that it contains in order to be decoded. The IDR picture  is used within a coding structure known as a closed GOP (in  which GOP stands for group of pictures).  The new clean random access (CRA) picture syntax specifies the use of an independently coded picture at the locationof a random access point (RAP), i.e., a location in a bitstream at which a decoder can begin successfully decoding pictures without needing to decode any pictures that appeared earlier in the bitstream, which supports an efficient temporal coding order known as open GOP operation. Good support of random access is critical for enabling channel switching, seek operations, and dynamic streaming services. Some pictures that follow a CRA picture in decoding order and precede it in display order may contain interpicture prediction references to pictures that are not available at the decoder. These nondecodable pictures must therefore be discarded by a decoder that starts its decoding process at a CRA point.

We see this option in the config file:

DecodingRefreshType : 1 # Random Accesss 0:none, 1:CDR, 2:IDR

CDR creates files with an open GOP – all frames can reference all other frames.  IDR creates closed GOPs – frames can only reference other frames within a GOP.  In practice setting IDR still allows frames from separate GOPs to be used as references, but the difference is that it creates a hard break at every i-frame just like the initial i-frame.

When you first begin to encode a file you may have noticed that the encoder automatically adjusts the first GOP’s after the initial i-frame. referenceframes_start This encode uses the low delay config we’ve been looking at.  Frame 4 would be POC 4 and you would expect it to have reference frames of -1, -4, -8, and -12.  This would equate to reference frames at POC 3, 0, -4, and -8, two of which do not exist.  The encoder automatically adjusts the GOP to use existing POCs during these first frames, so where it says L0 we see it has selected  3, 2, 1, 0 instead.   Each intra-frame is independent of the previous segment – the references of each segment terminate at the first i-frame of that segment rather than running the whole way to Frame1 of the file.  Hence if we set an intra-period of 320  the largest number of frames to be decoded for seeking is now 319.  So now we can create seekable HEVC files – yay!

But encoding is still slow as molasses.  We need to increase our speed somehow.  One convenient thing we can do with multiple hevc streams is concatenate them into a single .mp4 file.  We do this using mp4box and the following commands:

mp4box -add test.hevc:fps=24 test.mp4

mp4box -cat test2.hevc:fps=24 test.mp4

This will create a mp4 file that seamlessly transitions from the first video file to the second.   The only difficulties here are that if test1.hevc ends on a frame that is in the middle of a GOP then there will be a noticeable tearing effect when the transition to the second file occurs and that there will be an i-frame at every splice point regardless of the intra-period we set in the config file.  If you splice lots of little files together you’ll wind up with a proliferation of i-frames which will increase the bitrate of your final encode.  (In my testing I found that using a low intra-period of 30 increased filesize by 10-15% vs. an intra period of 300 in a 12000 frame encode.  We can assume that going from 300 to 3000 would thus reduce filesize by a much lower amount – most likely 1 – 1.5%.  Setting your intra period to something between 250 – 500 thus seems reasonable to me at this point.)

We can easily overcome these difficulties with smart management of our encode process. Since we already know the intra-period we want we can simply segment our source file where we expect there to be an i-frame.  We’ll do this using VirtualDubMod.

VdubMod_savesegmented In VirtualDubMod when you select ‘save as’ there is an option to save a segmented .avi file where you can specify the number of frames you’d like in each file.  However, it’s buggy so we can’t just put in the number we’d like.  Let’s say we need .avi files with 320 frames each (or we could do multiples of 320).  If you put in 320 then the segmented .avis will be saved with the FIRST file having 320 frames but every other segment will have 321 frames.  This creates a problem because we need all of our segments to line up with i-frames.  We can set the first segment to have 319 frames, but then it ends on the middle of a GOP and will create tearing when we concatenate the files.  Still, we want to set VirtualDubMod to segment our file at 319 frames (320 is what it will actually do) and then go back and correct the first frame of the series. To do this first select ONLY the first frame of the video in VirtualDubMod and hit ‘delete’ to get rid of it. The frame positions now line up like this:

Original:   1     2     3     4     5     ...     317     318     319     320     321

Edit:       2     3     4     5     6     ...     318     319     320     321     322

Segmenting the video with the first segment at 319 frames will thus end on the 320th frame of the original file.  I saved my file as ‘bbb.avi’ and the output was 45 files labeled bbb.00.avi -> bbb.44.avi.  *Remember to disable the audio stream before saving*  Once that’s done delete the first segment (bbb.00.avi).  Go back into VirtualDubMod and select ‘edit -> revert all edits’.  This restores the frame you deleted.  Now select frames from 320 to the end.  Delete them.  Save the remaining 320 frames as bbb.00.avi.

Whew, now we have 44 .avi files with 320 frames a piece plus the remainder in the final .avi.  Next comes a fun part…manually convert these all back to .yuv files using ffmpeg as detailed in my previous post.  Breaking this file down to 44 pieces is a bit excessive – in my testing I try to break files into segments that fit my schedule so I can start a segment when I go to work and have it finished when I get home, or when I go to sleep.  You can break a file into much fewer pieces if you choose – just make sure you cut it on frames that would be i-frames.

From there choose a config file you’d like to use – for my tests here I’m using encoder_randomaccess_main.cfg because I’d like to get some more testing done on these settings.  Because randomaccess.cfg has a GOP of 8 and intra-period must always be a multiple of GOP we use 320 as our intra-period (this is why we segmented the file into 320 frames).

If you’d like to exploit this segmented work flow for multi-threading then just create as many .cfg files as cores you’d like to use.  Remember that your output file is assigned in the ‘BitstreamFile’ option of the config file, so be sure to update that value for each input you use.  Also give each config file a separate ReconFile – I’m not 100% sure this is necessary but why not?  Having six different processes trying to access the same file certainly sounds like a bad idea.  As an example, here’s the first few lines of my config file for segment 13:

#======== File I/O =====================

BitstreamFile                 : bbb13.hevc
ReconFile                     : z4.yuv

FrameRate                     : 24          # Frame Rate per second
FrameSkip                     : 0           # Number of frames to be skipped in input
SourceWidth                   : 640        # Input  frame width
SourceHeight                  : 360         # Input  frame height
FramesToBeEncoded             : 320        # Number of frames to be coded

#======== Unit definition ================

MaxCUWidth                    : 64          # Maximum coding unit width in pixel

MaxCUHeight                   : 64          # Maximum coding unit height in pixel

MaxPartitionDepth             : 4           # Maximum coding unit depth

QuadtreeTULog2MaxSize         : 5           # Log2 of maximum transform size for

                                            # quadtree-based TU coding (2...6)

QuadtreeTULog2MinSize         : 2           # Log2 of minimum transform size for

                                            # quadtree-based TU coding (2...6)

QuadtreeTUMaxDepthInter       : 3

QuadtreeTUMaxDepthIntra       : 3

#======== Coding Structure =============

IntraPeriod                   : 320          # Period of I-Frame ( -1 = only first)

DecodingRefreshType           : 2           # Random Accesss 0:none, 1:CDR, 2:IDR

GOPSize                       : 8           # GOP Size (number of B slice = GOPSize-1)

#Type POC QPoffset QPfactor tcOffsetDiv2 betaOffsetDiv2 temporal_id #ref_pics_active #ref_pics reference pictures     predict deltaRPS #ref_idcs reference idcs 

Frame1:  B    8   1        0.442    0            0              0           4                4         -8 -10 -12 -16         0

Frame2:  B    4   2        0.3536   0            0              0           2                3         -4 -6  4               1       4        5         1 1 0 0 1

Frame3:  B    2   3        0.3536   0            0              0           2                4         -2 -4  2 6             1       2        4         1 1 1 1  

Frame4:  B    1   4        0.68     0            0              0           2                4         -1  1  3 7             1       1        5         1 0 1 1 1 

Frame5:  B    3   4        0.68     0            0              0           2                4         -1 -3  1 5             1      -2        5         1 1 1 1 0

Frame6:  B    6   3        0.3536   0            0              0           2                4         -2 -4 -6 2             1      -3        5         1 1 1 1 0

Frame7:  B    5   4        0.68     0            0              0           2                4         -1 -5  1 3             1       1        5         1 0 1 1 1  

Frame8:  B    7   4        0.68     0            0              0           2                4         -1 -3 -7 1             1      -2        5         1 1 1 1 0

Open as many command prompts as cores you’d like to encode with and get cracking.  For myself I have 6 cores and I ran 5 of them so I could still use the computer while encoding.  I just increased my HEVC productivity 5 times – sweet 🙂

multithreading

Once that’s all done your work folder should look something like this (I’ve cleaned up the .avi and .yuv files): encoding_complete Now we have to manually add all of these files into an .mp4 file.  We do this with the following:

mp4box -add bbb00.hevc:fps=24 bbb.mp4

mp4box -cat bbb01.hevc:fps=24 bbb.mp4

….

mp4box -cat bbb44.hevc:fps=24 bbb.mp4

And finally mux in the audio from the source file.  I re-encoded the audio so I’ll use the following:

mp4box -add bbbaudio.mp3 bbb.mp4

Rename your output to something professional looking.  That’s it!   You now have a fast-seeking work-flow optimized hevc movie to watch in haughty superiority.

You can download my HEVC encode of Big Buck Bunny if you’d like a reference, but quality is not very good coming from the IPad source and using the RandomAccess config.  Also, I made a mistake when creating the file so I actually had to re-encode the audio with a new delay which seems to sync up, but if it’s wrong that’s mea culpa.

Next time we’ll be looking more closely at GOP’s – particularly how reference frame decisions are made and we’ll be making our own first – not very good -custom GOP.  From here on out I think it’s a safe bet that the quality gains we can get out of the reference encoder will come from good GOP structures.  Happy Encoding until then!

HEVC in MKV – support is getting closer!

hevcmatroskamediainfo

DivxLabs recently released a patched version of MKVToolNix that supports muxing HEVC streams into the matroska container.  I downloaded it and managed to mux and then extract an hevc stream from a matroska file and it worked perfectly.  MediaInfo shows the video stream correctly (muxed with improper framerate here) but no decoder hooks in to take care of actually playing the video just yet.  I’m sure we’ll see a complete solution for decoding hevc in matroska soon, but until then at least we have muxing taken care of now!

Opera Next – First Impressions

Opera-logo

Opera is my day-to-day usage browser and has been for the past several years.  Ever since the first time I used Speed Dial I knew it was the browser for me.  It meant no more hiding my daily stomping grounds in favorites lists but rather having them displayed like windows ready to peek through with a single click.  With Opera Sync I’ve had the same Speed Dial for years – never mind the dozens of OS re-installs I’ve done in that time.  When I’m setting up my PC I know that getting my internet back to the way I like it is as easy as installing Opera, enabling sync, and waiting a few seconds for all my pinned sites, favorites, and passwords to be restored.  Not to mention ‘paste and go’ was the best thing to ever happen to browsers.  And since then what have we seen?  Google, Mozilla, and Microsoft have all copied those features.

So I love Opera.  Always have, even the not so stellar releases.  But this latest release… there’s something unsettling about all of the changes and concessions Opera has made with their next browser.  It still has the same sparse look – even sparser now with fewer tabs, only a single search/address bar, and no side panels – and the same great dashboard.  But it feels less unique and I regret that.

OperaSpeedDial

The main change you’ll find in Opera Next is the entire rendering engine.  Opera used to develop their own tech for this, but now they’re using webkit – a solution used by their competitors in both Chrome and Safari.  While this will have some benefits in helping bring more speed to the Opera browser, it will also mean your Opera experience is more samey.  And considering webkit comprises over 40% of all browsers it will also take away the special snowflake feeling that using Opera used to give – an probably the security of using a browser nobody felt like taking the time to hack.

In the beta Opera sync is also not working.  You can log into the web portal and manually add your saved sites to the speed dial, but it’s not automatic now.  Nor does it retain your previous passwords.  Hopefully this will be fixed in the coming months.

The e-mail client is gone as well.  It was apparently using too many resources – a laughable assertion – so they’ve stripped it to make the browser seem more small and efficient.  For me, using the beta for the past few days and not having my mail instantly pop up while I’m browsing has been a real letdown.  Honestly I don’t know who Opera thinks they’re pleasing with this – I’d bet 75% of Opera users don’t even know the mail client is there.  Why remove something that people can simply choose not to use and is very helpful to others?

OperaDiscover

There’s also a new ‘Discover’ option that pretty much just shows you trendy news and blog posts.  If you feel there aren’t enough services trying to target you with inane articles then I guess you might want to take a look at Opera.

There are some good aspects – Netflix finally works in Opera.  Yay I guess?  And there is one nice new feature that lets you temporarily save a site you want to return to in your ‘stash’ which can be accessed from the Speed Dial.  I haven’t really used it but I can see where it would be convenient for other folks.

OperaStash

With that said, I feel like my special browser is changing and becoming just another run-0f-the-mill Chrome skin.  And I really am worried about how this change will affect the security of the browser.  I don’t see how trying to be more like Chrome is going to get Opera a larger install base, but I do worry that changing and removing iconic features from the software could drive away loyalists. Ah well, this wouldn’t be the first dud that Opera has released, and a few steps back here will hopefully give way to some real innovation in the future.