Tom Looman

Adding Counters & Traces to Unreal Insights & Stats System

2026-03-19T00:00:00+00:00

The only sane way to optimize your game is by having good profiling metrics in game code. Unreal Engine comes packed with several good profiling tools and the Stats System (controlled by Stat Commands) along with Unreal Insights is what I will be covering today. It allows us to measure pieces of our game in different ways. I will demonstrate how you can use these metrics to your advantage, the macros are slightly different for the Stats System vs. Unreal Insights and we will cover both.

It is good practice to add metrics to certain areas of your code early. As features may perform fine initially, but may degrade as content or code changes. Having profiling stats in place enables you to quickly understand what’s going on.

To find code samples for each of these traces you can view the source code of Project Orion (Co-op Action Roguelike) on GitHub.

Types of Trace Metrics

The first available metric type is a cycle counter, it tracks how much time is spent in a certain function or “scope”. The second metric type is a simply counter, this can be useful to track event frequencies or instance counts rather than a measure of time.

You can find more macros in the following locations in the engine source:

Source/Runtime/Core/Public/ProfilingDebugging/CountersTrace.h (Counters for Unreal Insights)
Source/Runtime/Core/Public/ProfilingDebugging/CpuProfilerTrace.h (Cycle Counters for Unreal Insights)
Engine/Source/Runtime/Core/Public/Stats/Stats.h (the original “Stats System” to display in the game viewport, supported for viewing in Insights, you may need -statnamedevents enabled for some)

Counters

With Counters we can easily track frequencies of occurrences or other types of metrics such as instance counts of particular objects. You can use this information in a variety of ways, such as deciding on good pool sizes for certain Actors, testing the instance counts against other metrics such as cycle stats to understand how frame performance scales with X number of something.

If you just want to use the counters for Insights and not the viewport stats then available macros are slightly simpler to use.

Counters For Unreal Insights

Adding new Counters for Unreal Insights is very simple, define the following counter at the top of your cpp file:

TRACE_DECLARE_INT_COUNTER(CoinPickupCount, TEXT("Game/ActiveCoins"));

You can replace INT with FLOAT (TRACE_DECLARE_FLOAT_COUNTER) if you need decimal precision. You can now modify the defined Counter in game code with the following macros:

TRACE_COUNTER_SET(CoinPickupCount, CoinLocations.Num());
TRACE_COUNTER_ADD(CoinPickupCount, SomeNumber);
TRACE_COUNTER_SUBTRACT(CoinPickupCount, SomeNumber);

The counters can be viewed in the Counters tab of Insights. Keep in mind you need to use the -trace=counters trace channel for this data to be available.

Counters for Stats System

For the older Stats System is works slightly different since it requires a StatGroup under which to be displayed (eg. STATGROUP_Game). These stat groups are how stats are organized, you can type console command stat game to show everything listed in the STATGROUP_GAME, or stat anim for everything under STATGROUP_Anim. Define your own stat group by changing the following Macro:

DECLARE_STATS_GROUP(TEXT("My Group Name"), STATGROUP_MyGroupName, STATCAT_Advanced);

As an example, I track how many Actors get spawned during a session, so I added a counter to the ActorSpawned delegate available in UWorld.

At the top of the cpp file I declare the stat we wish to track. In the function that is triggered any time a new Actor is spawned we add the actual counter.

// Keep track of the amount of Actors spawned at runtime (at the top of my class file)
DECLARE_DWORD_ACCUMULATOR_STAT(TEXT("Actors Spawned"), STAT_ACTORSPAWN, STATGROUP_Game);

// Increment stat by 1, keeping track of total actors spawned during the play session (Placed inside the event function)
INC_DWORD_STAT(STAT_ACTORSPAWN); //Increments the counter by one each call.

The above example is to track occurrences, but often you want to measure execution cost instead. For that we use cycle counters.

Cycle Counters

Cycle counters can track how much CPU time is spent within a certain function or scope.

Cycle Counters for Stats System

In the next example I want to measure CPU time spent getting “Modules” on the player’s Ship.

DECLARE_CYCLE_STAT(TEXT("GetModuleByClass"), STAT_GetSingleModuleByClass, STATGROUP_LODZERO);

AWSShipModule* AWSShip::GetModuleByClass(TSubclassOf<AWSShipModule> ModuleClass) const
{
	SCOPE_CYCLE_COUNTER(STAT_GetSingleModuleByClass);

	if (ModuleClass == nullptr)
	{
		return nullptr;
	}

	for (AWSShipModule* Module : ShipRootComponent->Modules)
	{
		if (Module && Module->IsA(ModuleClass))
		{
			return Module;
		}
	}

	return nullptr;
}

In the next section we’ll go in how these stats can be displayed on-screen using the above two examples.

Showing metrics in-game (Stat Commands)

Toggling of these stats can be done per StatGroup and multiple can be on screen at once. To show a stat you open the console window (~ Tilde) and type stat YourStatGroup. For example, stat game or stat scenerendering.

Tip: To hide all displayed stats you can simply type: stat none.

Adding new profiling metrics to your game

As you can see it only takes a few Macros to set up your own metrics. The one missing piece is how to define your own StatGroup if you want to have a custom view for your stats using the Stats System.

DECLARE_STATS_GROUP(TEXT("LODZERO_Game"), STATGROUP_LODZERO, STATCAT_Advanced); 
// DisplayName, GroupName (ends up as: "LODZERO"), Third param is always Advanced.

Add the stat group to your game header so it can be easily included across your project. (eg. MyProject.h or in my case I have a single header for things like this called RogueGameTypes.h)

Finally, it’s important to note you can also measure just a small part of a function by using curly braces.

void MyFunction()
{
    // This part isn't counted
   
    {
         SCOPE_CYCLE_COUNTER(STAT_GetSingleModuleByClass);
         // .. Only measures the code inside the curly braces.
    }

    // This part isn't counted either, it stops at the bracket above.
}

Named Events

Named Events are a special option for tracing with additional detail, and add relatively significantly more overhead (I have heard numbers as high as 20%). They should not be used to gauge overall frame performance and instead are a powerful insight into your game code by including details such as which specific Actor or Class was running the traced logic. Where normally you might only know that some object in the frame was ticking CharacterMovementComponent, using Named Events you can find out exactly which class, such as BP_PlayerCharacter was who ticked the component.

To enable this level of detail either specify -statnamedevents on the command line or type stat namedevents while running the game. Add the following macro to your game code SCOPED_NAMED_EVENT, see below for examples.

SCOPED_NAMED_EVENT(StartActionName, FColor::Green);
SCOPED_NAMED_EVENT_FSTRING(GetClass()->GetName(), FColor::White);

First parameter is the name as it shows up in Unreal Insights or the Stats System, the second is the color for display, however by default Insights does not use this color.

The example below has two examples, one tracing the entire function while the second variation is placed within curly braces which limits the trace scope to within the curly braces. The _FSTRING variant lets us specify runtime names, but does add additional overhead so it should be used with consideration.

bool URogueActionComponent::StartActionByName(AActor* Instigator, FName ActionName)
{
  // Trace the entire function below
  SCOPED_NAMED_EVENT(StartActionName, FColor::Green);

  for (URogueAction* Action : Actions)
  {
    if (Action && Action->ActionName == ActionName)
    {

    // Bookmark for Unreal Insights
    TRACE_BOOKMARK(TEXT("StartAction::%s"), *GetNameSafe(Action));
			
    {
      // Scoped within the curly braces. the _FSTRING variant adds additional tracing overhead due to grabbing the class name every time
      SCOPED_NAMED_EVENT_FSTRING(Action->GetClass()->GetName(), FColor::White);

      Action->StartAction(Instigator);

      // ... running more code, all captured by the named event
    }

    // ... this code is not included in the _FSTRING trace since its outside the curly braces
    }
  }
}

Closing

(Cycle) Counters for both Unreal Insights and the Stats System are incredibly useful if used pragmatically and provide a quick insight in your game’s performance. Make sure you add stats conservatively, as they are only valuable if you get actionable statistics to potentially optimize. They add a small performance overhead themselves (in non-shipping builds only) and any stat that is useless just adds to your code base and pollutes your stats view.

You might be interested in my other C++ Content or follow me on Twitter!

References

Setting up Rider for C++ and Unreal Engine

2026-02-13T00:00:00+00:00

In this article we will install JetBrains Rider for use with Unreal Engine 5 to setup your C++ development environment. These steps are for a fresh machine, with Unreal Engine 5.6 installed. It should work with older and newer versions too, at certain moments Epic will bump the required versions of dependencies that we install today.

Note: The steps demonstrated are for Windows & JetBrains Rider. Please read this article carefully as each component requires specific versions depending on the Unreal Engine version you are using. Any wrong version or missing component and the code will fail to compile.

Required Software

JetBrains Rider
Epic Games Launcher and Unreal Engine 5.0+ installed
Visual Studio Build Tools (MSBuild) or direct link to VS_BuildTools.exe

Installing JetBrains Rider

Get the latest version of Rider for Windows on JetBrains website. You can use the default settings during installation. The most important steps are during the Visual Studio Build Tools installation below.

After installing the Epic Games Launcher, I recommend running Unreal Engine from the Launcher at least once so it can install any prerequisites before moving on. Simply click “Launch” on your installed version and let it do its thing. You don’t need to keep it open afterwards.

If you are installing JetBrains Rider to follow along with my Unreal Engine C++ Course, you can select the Free license (“Rider Non-commercial”) as you are a student using it for educational purposes.

Installing Visual Studio Build Tools

To compile C++ projects for Unreal Engine you need the build tools from Microsoft even when not using Visual Studio.

You can either install full Visual Studio IDE, but since you want to use Rider for your source code editing you can install the Visual Studio Build Tools instead which include the minimum set of components required for compilation.

You can use this direct link to download Visual Studio Build Tools or scroll down on the page to look for Visual Studio Build Tools under “Tools for Visual Studio” (scroll down quite a bit).

Required Individual Components

Unreal Engine C++ build pipeline requires a very specific set of components to be installed. Please follow the instructions below carefully. I have also listed some possible errors you may encounter if you did not select the correct components. Selecting the wrong version of a component can cause your project to fail compilation.

For a list of recommended components to install, Epic maintains a list here. I will list all the versions below for Unreal Engine 5.6. Every few releases of Unreal these are bumped to a more recent component version.

A very specific set of components in their correct version must be selected for Unreal Engine to compile correctly.

Go to “Individual Components” tab in the Visual Studio Installer and select the following components:

Windows 11 SDK (10.0.26100.3916) or higher
.NET Framework 4.8.1 SDK or higher
MSVC v143 - VS 2022 C++ x64/x86 build tools (v14.38-17.8)

The above versions are for UE 5.6, if you are using a different version check the current list on the official docs.

Creating a C++ Unreal Engine Project

If you do not have an Unreal Engine C++ project yet, you can create one via the Unreal Project Browser. A blank C++ project is enough to validate your installation.

Games > Blank > C++ (right panel)

Opening your C++ Project in Rider

You can open your unreal engine MyProject.uproject file directly into Rider. Either drag-drop the .uproject file into the Rider App or Browse directly to the file from inside rider by clicking “Open” in the main window of Rider.

Note: You don’t need to use any generated solution files (.sln) as with Visual Studio. It is recommended to only use the .uproject file for Rider as this causes fewer issues and automatically syncs any changes made to project structure.

After opening the project, Rider will generate the “Unreal Engine Project Model” in the background. You should give Rider some time to process and index the engine files. This may take a while and the indexing will help the autocompletion and navigation of the source files.

Compile your project

Try to compile your project to ensure all components are installed correctly. In any error occurs, check the Troubleshooting section below.

Main menu > Build > Build Startup Project

Access the main menu in the top-left

Select Build Startup Project (which should be your game project) to verify the installation has succeeded.

Configuring Rider as the “Source code editor” in Unreal Editor

You should set Rider as the Source Code Editor in Unreal Editor so that your Blueprint can immediately jump to C++ code when you double click the nodes or when you right-click “Go to Definition”.

In the Editor Preferences > General > Source Code > Source Code Editor set it to “Rider uproject”

Installing RiderLink

You will be prompted when launching Rider with an Unreal Engine project to install RiderLink.

I recommend Installing RiderLink plugin to the Engine. This is a super powerful tool to view information on how your project and Blueprint is using your C++ code such as knowing which Blueprint has changed a variable default or overrides a function.

Find the Notifications tab on the top-right to install RiderLink to Engine (recommended)

Windows Defender Exclusions

Ensure windows defender exclusion are used. The pop-up will appear in bottom-right on first launch. This avoids overhead from Defender constantly scanning your files. Do so at your own risk, but you should have clear control over your own build output.

If you previously ignored the pop-up, you can still find it inside the Notifications Tab in the top-right.

Errors & Troubleshooting

The following errors all require the Visual Studio Installer and selecting the specified components under “Modify”. These errors will not happen if you selected the correct component versions during the initial installation steps.

Even when installing JetBrains Rider, the Visual Studio installer is still an important part of the overall setup.

Error: No valid Visual C++ toolchain was found

Error message: “No valid Visual C++ toolchain was found (minimum version 14.38.33130, preferred version 14.38.33130). Please download and install Visual Studio 2022 17.8 or later and verify that the “MSVC v143 - VS 2022 C++ x64/x86 build tools (v14.38-17.8)” component is selected in the Visual Studio 2022 installation options.”

Steps:

Go back into your Visual Studio Installer (same name if you have installed Build Tools instead of full Visual Studio).
Click “Modify” in the Visual Studio Installer, see screenshot above.
Go to Individual Components
Search for “MSVC v143 - VS 2022 C++ x64/x86 build tools (v14.38-17.8)”.
- Read your error message, earlier or later versions of Unreal Engine might require a different version.

Error: No available Windows SDKs found

Another error you may receive if you try to run the project is “No available Windows SDKs found”, “Windows SDK must be installed in order to build this target.”.

Steps:

Go back into your Visual Studio Installer (same name if you have installed Build Tools instead of full Visual Studio).
Click “Modify” in the Visual Studio Installer, see screenshot above.
Go to ‘Individual Components’
Search for: Windows 10 or 11 SDK (10.0.18362 or Newer)
- See the documentation page on which SDK version is required for your Unreal Engine version.

Error: Install a version of .NET Framework SDK at 4.6.0 or higher

More errors may occur if you once again try to launch already. “Install a version of .NET Framework SDK at 4.6.0 or higher”, Generating Rider project files will fail with several module errors which all mention to install the .NET Framework SDK.

Steps:

Go back into your Visual Studio Installer (same name if you have installed Build Tools instead of full Visual Studio).
Click “Modify” in the Visual Studio Installer, see screenshot above.
Go to Individual Components
Search for “.NET Framework 4.8.1 SDK”

Installing Editor Symbols for debugging (Optional)

In order to debug and use breakpoints in the engine source code (You don’t need this to debug your own game code) you need to install the Editor Symbols for debugging in the Epic Games Launcher.

In the Epic Games Launcher, go to Unreal Engine > Library and click the arrow next to launch to select Options. From there you can enable the “Editor symbols for debugging”. It takes up a large amount of disk space so you may keep this for another time.

The options can be used to modify your engine installation.

Debugging Symbols can take up to 50GB of disk space (download size is significantly less).

Recommended Rider Settings

The following settings are recommended personally. They are not required to compile or use Unreal Engine and you should first check to see if you like to keep any of these settings enabled.

Settings can be accessed in the top-left under File > Settings.

Indexing Plugins

By default “Plugins” will not be indexed and many common modules of the engine are considered Plugins by Rider including Enhanced Input and Niagara. I would recommend to enable this or they won’t show up in autocompletion and code searches.

Enable indexing of Plugins for have better coverage of the engine source code.

Without Plugin indexing, certain parts of the engine won’t have any highlighting and do not show up in searches.

By default the function parameter info popup is delayed by 1000ms. I find this too slow and can actually be changed. Tune this to something that feels more responsive.

The parameter info popup is this little window when you start typing function parameters.

Change it to something less than 1000ms. All the way down to 0ms if you prefer no delay at all.

Preference: Turn off “Reader Mode”

Reader Mode enables “rendered comments” which does provide much nicer looking function comments in header files. The problem is they can take a bit of time to actually render as intended, showing the standard comments when first opening the file. I prefer this off, keeping it consistent and not popping halfway through reading a comment. For Unreal Engine this applies only to the engine source, your own code does not apply the reader mode.

Preference: Turn off “Code Folding” on Imports

Code folding can automatically collapse the list of #includes (“Imports”) at the top of the file. You may like it, but I prefer to see this at all times. Especially to make sure its visible during the course lessons to students, but also to keep an eye on no longer used includes so I can remove them (Rider will render them as Grey when nothing uses the include).

Editor > General > Code Folding > “Imports”

Preference: Turn off Full Line completion suggestions

Full line code completions could be very useful, but personally I prefer not to have added cognitive load of checking the suggestions constantly and reasoning whether that’s what I intended. More so to avoid any distractions for recording tutorial/course content. Try it out and see whether you like it. Otherwise, you can easily disable it in Rider.

Editor > General > Inline Completion > “Enable local Full Line completion suggestions”

Preference: Turn off Hard Wrap Visual Guide

You can turn off the white line in the text editor that is called the Hard Wrap. This is where code will wrap during formatting and code generation. You can turn off the visual style under Editor > General > Appearance > “show hard wrap and visual guides”. To actually disable the hard wrap behavior itself, it can be found in the Code Style as “Hard Wrap”.

To keep a clean an minimalist UI I prefer to remove any buttons I won’t be using. In my case that includes things like JetBrains AI and Code with Me. But you may of course wish to keep those and remove some others. You can simply right-click the toolbar and click “Customize Toolbar”.

Setting your HotKeys & Theme

During my tutorials and courses I use the Visual Assist keymap and visual theme. You are of course free to pick whatever you are most comfortable with coming from any particular source code editor prior to using Rider.

Closing

Having Trouble? If you had any issues during the setup process that were unclear or not covered in this article. Let me know through my contact form and I will see if I can update the article.

If you are one of my students, you are now ready to follow along with my Unreal Engine C++ course! You might also be interested in checking out my Complete Guide to Unreal Engine C++ article as a companion reference and introductory guide to many of the important concepts to programming within Unreal Engine 5.

C++ Course Completely Rebuilt for Unreal Engine 5 (Early Access)

2026-02-09T00:00:00+00:00

The Professional Game Development in C++ and Unreal Engine has received the first 12+ hours of major overhaul content (total course length expected roughly 25 hours) to bring it up to Unreal Engine 5.6 and above! The course is being completely rebuilt and re-recorded from the ground up. The course includes further streamlining of the curriculum and covering both big and small features of Unreal Engine 5 such as Enhanced Input, Niagara (for gameplay), InstancedStructs, Data-oriented design, and lots more…

Free Upgrade

This Unreal Engine 5 version is a FREE upgrade for all current students. It has taken many months of work and it is far more than a re-recording of the original. Instead every single line of code has been reviewed and considered with some content getting cut and replaced. This resulted in streamlined lesson plan and a chance to swap out some repetitive bits of the original with more exciting concepts either new to UE5 or that simply did not make it into the original curriculum for any number of reasons.

You are automatically enrolled into the Early Access Refresh when purchasing the original course. The landing pages will be swapped out once the refresh hits the full “v2.0” launch.

What’s Changed?

Every lesson is re-recorded and redesigned. The overall structure of the lessons has improved. Certain (gameplay) mechanics have been swapped out to allow more interesting programming concepts to be covered instead.

Not only does it refresh the content for Unreal Engine 5, it will include additional programming concepts such as Data Oriented Programming, which is an alternative to the usual OOP (Object Oriented Programming) using objects such as Actors and instead focusing on data first. This allows for significantly higher performance in certain scenarios and can often keep code simpler at the same time!

The general design of the game you will build during the course remains largely the same. A third-person action RPG/Roguelike with a custom GAS-like system, enemy AI and full multiplayer support.

Available lesson plan at the time of writing:

Project Setup - configuring a character for third-person and using Enhanced Input.
Creating a projectile based attack including Animations, Niagara, MetaSounds, delegates, damage, etc.
Assignment 1: Jumping and building an Explosive Barrel
Interaction System, collision queries, interfaces.
Blueprint scripting on top of our C++ to build out mechanics
Assignment 2: Blackhole & Teleport Projectiles
Player Attributes - setting up Health, death, delegates and UI
Debugging tools, including CVARs, asserts
Applying polish into the systems
Assignment 3: Health Pickup
Enemy Monsters: building out AI using Behavior Trees (StateTrees used elsewhere)
Assignment 4: Flee & Heal behaviors for enemies
Action System: building our own GAS-like including Actions, Buffs/debuffs, Attribute handling.
Using GameplayTags in-depth.
…remaining lessons uploaded on a weekly schedule

UE4 Course Access

The original “Unreal Engine 4” version will remain available for new and existing students. If you are currently following that version of the course, you can continue to do so without interruption. The course refresh exists as its own course product on the platform.

Price Adjustment

As a reminder, once the full refresh is completed, the price of the course will be adjusted. The projected price is $395,- for the Indie and $795,- bringing it in line with the Complete Game Optimization for Unreal Engine 5 Course and still sits well below the inflation rate over the past 4 years since the original course launched.

Early Impressions from Students

Feedback from students with early access have been overwhelmingly positive! This gave me the confidence I am on the right track for the remainder of the content. A few snippets below to give you an impression:

“This has so far been the best course I have ever taken that is better than anything else I’ve seen. I have been working with UE since 4.26 and I was doubtful coming to this course that ‘What if I already know this stuff’ but oh my god. Since the first lecture I am learning bits and shortcuts for things I have used so much of my time on. From switching to Rider, to different implementation methods of BPs and callbacks, to debugging, with actual usage of assets rather than some UE_LOG with some numbers.” - Pouya N.

“This is the best Unreal Engine 5 Course hands down. I now feel like I am comfortable enough to start a new project and just start working on a small concept to keep learning and running into problems. I like that you purposely leave things simple until you run into an issue where you need to change something or add to it. Running into those problems is what reinforces learning, at least for me. I love the links you post with each video they are super helpful when something works but I don’t really have a good understanding of it. It really shows you how you build a system for a game from scratch. Other courses just do too much at once often leaving the student asking why something was done this way.” - Adrian L.

“…This new course refresh looks incredible, thanks so much for all your hard work!” - Joe P.

What’s Next?

In the next few weeks or months the remainder of the lessons will be completed. You can check back regularly as every week new lessons are published. This refresh has been a long time coming, and I am excited to finally be so close to finishing and making it available to all students! You can support me by telling your friends and colleagues about the course and the new update!

Unreal Engine 5.7 Performance Highlights

2025-11-12T00:00:00+00:00

It is time for another Unreal Engine 5.7 Performance Highlights post! I have compiled a list of most impactful changes which make it worthwhile to upgrade to 5.7. I trimmed the list more substantially this time around to make it more digestible and really highlight the most interesting areas while keeping the more minor changes out. I have included annotations and clarifications not found in the original release notes.

As usual I approached this list from the game development perspective. Focusing on runtime performance of the game, profiling capabilities, bugs that affected performance, new CVARs for quality/performance tuning and some of the editor iteration performance as those have some notable changes.

There are some major improvements like a new Nanite Foliage system using voxels which is a game changer and a much desired improvement for foliage rendering. Lumen continues moving away from Software ray-tracing (SWRT) by deprecating their “SWRT detail traces” render path and focusing on getting hardware raytracing to run at 60hz. MegaLights is moving into Beta, we can now inject Custom HLODs to give us greater control for distant geometry and more unusual changes such as optimizations to Windows high-precision mouse handling. Read the original full release notes here

This article is part of my efforts of keeping Unreal Engine developers informed about Game Optimization! For that I have a in-depth Game Optimization Course for Unreal Engine 5 to train engineers and tech artists everything they need for profiling, optimizations and understanding performance in UE5.

Nanite

Added a new culling check that can improve Nanite culling speed and reduce the amount of memory needed for candidate clusters (r.Nanite.Culling.MinLOD enabled by default, turn off for testing only). My understanding of this culling is that it can skip child clusters during culling, simply put, we get faster culling that’s enabled by default.

Added experimental and optional passes to prime the HZB before VisBuffer rendering if the HZB is missing (e.g., due to a camera cut), see cvar r.Nanite.PrimeHZB et al.

The main idea is to draw a lower resolution (HZB or lower) and lower detail (by using LOD bias and/or drawing only ray tracing far field scene geometry).
This new render is then used to build a HZB with some depth bias that is then used to render the full scene, greatly improving the cost in many cases.

Epic has talked about priming the HZB for culling during camera cuts in The Witcher 4 Unreal Fest talk, that will best explain this optimization.

Exposed NanitePixelProgrammableDistance for Nanite skinned meshes to enable forcibly disabling pixel programmable rasterization of Nanite when the mesh is further than a given distance from the camera.

Nanite Foliage and Skinning (Experimental)

Nanite Foliage is a huge step forward for performant foliage rendering with Nanite. You will now build foliage from building blocks (like a variety of branch meshes to build a tree) instead of importing a single tree or chunk of grass.

Nanite foliage is now animated using Skinning rather than WPO (world position offset) which has poor Nanite support (WPO is generally not suitable for Nanite due to its increased rendering cost).

This primarily added the following features:

Nanite Assemblies, for building foliage assets using instanced parts to keep asset sizes small.
Skeletal Mesh-driven dynamic wind, trees are skeletal meshes and the new Dynamic Wind plugin enables procedural wind to affect their bones.
Nanite Voxels, for efficiently rendering foliage geometry in the distance.
TSR Thin Geometry detection to better handle foliage.

Again Epic demo’d this during The Witcher 4 talk here.

Lumen

Lumen Continues to move toward a single rendering path with HWRT (Hardware raytracing) at 60hz. Epic already deprecated the SWRT detail tracing in 5.6 and continues their efforts on the HWRT side. The good thing about this is a unified lighting solution that we can work against rather than having to pick one and hoping for the best or spending the extra time supporting both. Their continued performance improvements to Lumen Hardware Raytracing (HWRT) should allow higher quality and more stable lighting that no longer relies on any simplified Distance Field representations

Enabled half res integration on High scalability (r.Lumen.ScreenProbeGather.StochasticInterpolation 2).

This can soften normals in indirect lighting, and make GI a bit more noisy.
On the other hand it saves ~0.5ms on console at 1080p, which feels like a right tradeoff for High scalability intended for 60hz on console.

r.LumenScene.DumpStats 3 will now dump primitive culling information. This is useful, as it merges instances into a single entry making it easier to read the log.

Tweaked default settings:

r.Lumen.TraceMeshSDFs 0 - SWRT detail traces (mesh SDF tracing) is now a deprecated path (Note: deprecated since 5.6), which won’t be worked on much. For scaling quality beyond SWRT global traces it is recommended to use HWRT path instead.
r.Lumen.ScreenProbeGather.MaxRayIntensity 10 - firefly filtering is now more aggressive by default. This removes some interesting GI features, but also reduces noise, especially from things like small bright emissives

Moved GBuffer tile classification out into a single pass, which is also storing opaque history for the next frame. This pass is reused across Lumen and MegaLights for better performance.

Screen tile marking optimizations, which speed up reflections, GI, and water reflections.

MegaLights

MegaLights has now entered Beta, instead of Experimental. From the release notes this appears to mostly mean reduction in noise and overall performance improvements.

Implemented MegaLights-driven Virtual Shadow Mapping marking to only mark VSM pages that MegaLights has selected samples for.

Added the r.MegaLights.DownsampleCheckerboard, which can run sampling/tracing at half res (every other pixel). It’s a good middle ground between default quarter res sampling and option full resolution sampling.

Merged downsample factor / checkerboard CVars into a single r.MegaLights.DownsampleMode CVar.

Exposed r.MegaLights.HardwareRayTracing.ForceTwoSided, which allows to flip between matching raster behavior and forcing two-sided on all geometries in order to speedup tracing.

Now always vectorize shading samples. This saves on average 0.1-0.2ms in complex content on current gen consoles.

Now uses downsampled neighborhood for temporal accumulation. Interpolated pixels don’t add much to neighborhood stats, so we can skip them, improving quality by effectively using a wider neighborhood filter. This also improves performance of the temporal accumulation pass, as it now can load less data into LDS.

Now merge identical rays in order to avoid overhead of duplicated traces. Duplicated rays happen with strong point lights, where we may send a few identical rays to the same light doing unnecessary work.

Now require ray tracing platform support, in order to skip compiling and cooking MegaLights shaders on certain platforms.

Custom HLODs

When using the provided automatic HLOD generation methods, the output may not always meet the project’s requirements and is bound to mesh components. Custom HLODs support addresses this limitation by giving you the ability to provide your own custom HLOD representations for individual actors or groups of actors.

You can use the new World Partition Custom HLOD actor class in two ways:

Inject custom representation directly: You can inject the custom representation as-is into the HLOD runtime partition and optionally use it as input for parent HLOD Layers.
Provide a custom source only: You can use the custom representation as input to the HLOD generation process without adding it to the world itself.

Custom HLOD support:

Added WorldPartitionCustomHLODActor, an actor you can place in the world to provide a custom HLOD representation (using static mesh component).
Added a new HLOD Layer type: “Custom HLOD Actor”.
Custom HLOD actors assigned to “Custom HLOD Actor” layer are injected as-is into the HLOD runtime partition.
The “Custom HLOD Actor” layer can specify a parent layer. In that case custom HLOD actors are also used to generate parent layer’s HLOD representation.
Custom HLOD actors can also be assigned to other (non-Custom) HLOD Layers. In that case they are used only during HLOD generation and are not pulled into the HLOD partition themselves.
Adds a new LinkedLayer property to UHLODLayer, visible only when LayerType is set to Custom HLOD Actor
LinkedLayer is used to control the visibility of Custom HLOD Actors. They become visible when actors from the LinkedLayer are unloaded.
If LinkedLayer is not specified, Main Partition is used to control the visibility.

Windows (Mouse/Cursor)

This release fixed some issues with mouse/cursor handling on high poll-rate mice which is scattered across the release notes. It boils down to removing some cpu stalls and moving it onto a workerthread.

Updated mouse capture and high precision mode handling on Windows to save up to .5ms on redundant API calls
Reduced mouse input overhead on Windows by processing raw mouse moves on a separate FRunnable thread in the WindowsApplication.
Moved Win32 calls for mouse cursors off of the game thread and onto the task graph to reduce stalls on the game thread from internal lock contention that occasionally occurs.
Moved processing of mouse inputs to a dedicated thread to reduce overhead on the game thread for Windows. This is useful for mice with high polling rates, e.g. 8000 Hz.

This behavior is enabled by default, but you can go back to processing only on the main game thread with this console variable:

[ConsoleVariables] WindowsApplication.UseWorkerThreadForRawInput=false

SMAA (Experimental)

Mobile and desktop renderers now support Subpixel Morphological Anti-Aliasing (SMAA). You can enable it with the mobile renderer using the CVar r.Mobile.AntiAliasing=5, and with the desktop renderer using the CVar r.AntiAliasingMethod=5.

This feature improves visual fidelity by providing high-quality edge smoothing with minimal performance impact, it is an efficient technique for mobile games.

SMAA anti-aliasing is enabled on all platforms.
Adjust quality settings using r.SMAA.Quality
Adjust edge detection between color or luminance using r.SMAA.EdgeMode
Improve edge smoothness without heavy GPU cost.

Currently an experimental feature. The quality settings provide for tuning tradeoffs between performance or visual fidelity.

Chaos Physics

Simulation performance improvements:
- More parallel simulation stages.
- We added the Experimental Partial Sleeping feature for improved performance of large unstructured piles. Note: there are a variety of CVARs available to tune this, check the full release notes for a list and explanation.
Query performance improvements:
- Improvements to sphere and capsule narrow-phase.
- Improvements to the general query layer transform and scale handling.

Added a new p.Chaos.MinParallelTaskSize cvar and CVarMinParallelTaskSize function to set a threshold of tasks to run in parallel, which can improve performance on low-core platforms.

Enabled some p.Chaos cvars (like p.Chaos.MaxNumWorkers) in Shipping so they can be used to tweak game-specific performance.

Chaos Cloth

Optimizations: Epic implemented optimizations to the cloth game thread and interactor tick.

Chaos Visual Debugger

They improved the editor performance of this tool, I wanted to highlight it here because it is a powerful tool to inspect your collision and physics configuration on the map. You can inspect where and how often you are performing queries and you might be surprised how much unnecessary collision data you have loaded in your maps!

Performance improvements: In this release, we continued working on improving CVD’s performance. CVD files with a large amount of data (specifically over 2 GB of collision geometry data) now loads up to 30% faster.

Improved batching of TEDS operations performed when new physics body data is loaded in the scene. This change, combined with some other improvements made in TEDS itself, resulted in ~75% reduction in the processing time when the first frame of a large CVD recording (+90.000 objects) is loaded. The stall in this particular case went down from ~12 seconds to ~3 seconds.

Improved tracing performance by removing the need for locks (and reduced the contention of ones we could not remove) in some heavily multi threaded paths. Mostly in collision geometry and physics body metadata serialization paths.

I assume the mean “profile tracing” performance here and not any kind of collision traces…

Slate UI

Slate drawing optimization: fixed float comparison of the alpha that could send an invisible element to the renderer, for negative or small alpha values.

CommonUI:

Updated CommonWidgetCarousel to be more consistent in its widget caching.
Added an option to determine whether or not to cache widgets at all
Old behavior was to cache widgets in the carousel’s RebuildWidget, but widgets added at runtime via AddChild would never be cached.
Currently caching is enabled by default to match the behavior when children were added at design-time, though this could result in increased memory usage in use cases where all widgets were added dynamically.

Optimized the font shaping process when using complex shaping (typically for Arabic).

Added a CVar Slate.UseSharedBreakIterator to support shared ICU Break Iterators to reduce CPU usage. False by default.

Note: These iterators are used to find line ends, sentences etc. in text.

Garbage Collection

Improved accuracy of GC time limit by setting object destroy granularity to 10 (instead of 100) and provide a way for it to be tweaked per project with gc.IncrementalBeginDestroyGranularity.

Fixed a bug in GC timing where if it destroyed exactly 100 objects in a frame it could cause a large stall.

Extended gc.PerformGCWhileAsyncLoading to provide the option to garbage collect while async loading, only when low memory is detected (based on gc.LowMemory.MemoryThresholdMB).

Made -VerifyGC and -NoVerifyGC take all verification cvars into account.

Note: Previously we needed to still specify several CVARs to fully disable all verification in certain GC passes. These included: gc.VerifyAssumptionsOnFullPurge, s.VerifyUnreachableObjects, s.VerifyObjectLoadFlags

Multi-threading

Epic made many improvements for multi-threading concepts such as mutexes. I have compiled a few here that may be of interest, but there are more in the full release notes. You can find them under the “Core” sections.

Added Async.ParallelFor.DisableOversubscription which if enabled will only use existing threads which can be faster when cores are saturated.
Fix potential deadlock in the scheduler if a foreground task is scheduling a background one.
Fix potential deadlock in object annotations because of 2 different locks used in different order.
Use Yield() instead of YieldThread() when spinning in mutexes to improve performance under heavy system load.

Gameplay

Added QueueWorkFunction and SendQueuedWork to the TaskSyncManager which can be used to queue a batch of function pointers to execute as a simple tick task. This can be used with RegisterTickGroupWorkHandle to safely schedule batched gameplay thread work from any worker thread.

This system was added in 5.6: “Added a new (experimental) TaskSyncManager to the engine which allows registration of globally accessible tick functions that can be used to synchronize different runtime systems and efficiently batch per-frame updates.”

Implemented SlideAlongNavMesh option for Character Movement Component NavWalking mode. This means pawns in NavWalking mode can move along a navigation mesh rather than just moving towards a projected point on the navmesh.

Core

Upgraded to Oodle 2.9.14 (Compression) some interesting changes I found include:

Bugfix: Detect Intel 13th/14th gen Core CPUs and work around instruction sequences implicated in high crash rates w/o microcode patches. Can be up to 20% slower on affected machines for certain rare inputs, but typical decoder slow-downs are around 0.5%. No perf impact on other machines.
- Note: This fix may be significant for your game as even when not explicitly using this for compressing your packages it may affect shader compression and be causing crashes if you have a live game with UE5.
Substantially improved Kraken “Fast” compression for large input blocks (much faster for binary data and typically smaller), slightly faster Kraken “Normal” compression (~5% higher throughput is common).
Leviathan “Fast” compression for large input blocks is typically faster and higher ratio.

Added UE_REWRITE, a replacement for FORCEINLINE intended to be used on macro-like functions (rather than for optimization) by ‘rewriting’ a function call as a different expression.

Introduced a mechanism for render asset updates to be abandoned by owning UStreamableRenderAssets so UStreamableRenderAssets can be garbage collected independently (See r.Streaming.EnableAssetUpdateAbandons). Abandoned render assets are ticked during GC or every streaming update if r.Streaming.TickAbandonedRenderAssetUpdatesOnStreamingUpdate is true (defaults to true).

Developers can change the default behavior of TArray<> and related containers to behave more like traditional C++ containers, so that they will only compact when Shrink() is explicitly called. Use the FNonshrinkingAllocator as your array allocator to request this behavior. This can also be passed as the secondary allocator for TInlineAllocator.

World Building

FastGeo: Moved FastGeoContainer PSO Precaching from PostLoad (GameThread) to be asynchronous.

FastGeo is Epic’s new 5.6+ static geometry streaming solution by avoiding actor/components altogether in an effort to greatly improve streaming performance.

Windows

Changed thread priorities for Game, Render, and RHI threads from Normal to AboveNormal on Windows.

Lighting

Enabled VSM receiver masks for directional lights by default.

~10MB memory overhead (depending on settings/scalability).
Often fairly significant performance improvements in uncached cases with lots of dynamic geometry.

Epic showcases these VSM improvements in their Witcher 4 talk at Unreal Fest.

Scale the light fade range by GLightMaxDrawDistanceScale to avoid lights becoming too dim at lower scales. (previously Epic only scaled the MaxDrawDistance, not the MaxDrawFadeRange)

Added and immediately deprecated r.LightMaxDrawDistanceScale.UseLegacyFadeBehavior_WillBeRemoved to temporarily restore previous behavior. This CVAR relates back to the change with light fade range above to keep/compare original fading behavior.

Exposed Max Draw Distance and Max Distance Fade Range to Blueprint.

These properties exist to blend out lights at a distance as a way of culling. Now you can more easily configure them in Blueprint such as during Construct scripts which is potentially very helpful

Added TexCreate_3DTiling flag to volume textures in TranslucentLighting to reduce memory and boost performance on some platforms.

Marked r.UseClusteredDeferredShading and r.Mobile.UseClusteredDeferredShading as deprecated and added notification about future removal. Clustered deferred shading will be removed in a future release due to lack of utility to reduce maintenance burden.

Materials and Shaders

Added an asset registry tag to find material instances causing shader permutations.

This one is best explained by the Unreal Fest talk which discusses changes to Materials in 5.7 named: [Unreal Materials: New Features & Productivity Enhancements

Unreal Fest Stockholm 2025](https://youtu.be/KYmd_LNlw2c?si=qmCh89qkrIU3mcOI&t=1470)

Added platform cached ini value to determine whether to compile 128 bit base pass pixel shader variations on platforms which require them. These are infrequently needed and turning them off can save 50k shaders / 15 MiB at runtime r.128BitBPPSCompilation.Allow (default is true, for backwards compatibility). Note:

Worthwhile to check if you can disable this in your project.
Used for Pixel Formats on RenderTargets that require: PF_A32B32G32R32F (32-bit per channel, 4 channels)
Check for water rendering, scene capture targets, or search for functions such as PlatformRequires128bitRT in source.

All the notes below regarding Temporal responsiveness and TSR are best explained through the Unreal Fest Material Enhancements talk which discusses these changes.

Material Editor: Added two experimental custom output nodes Motion Vector World Offset (Per-Pixel) and Temporal Responsiveness to give users a way to modify per pixel motion vectors and set how responsive the temporal history is.

Temporal Responsiveness: Describes how temporal history will be rejected for different velocity mismatch levels.
- Default :0
- Medium [0,0.5]
- Full [0.5, 1.0].
- Now, it will be used by TSR to change rejection heuristics. Translucency material can also use it to request higher responsiveness if depth is written (r.Velocity.TemporalResponsiveness.Supported=1) or clipped away (r.Velocity.OutputTranslucentClippedDepth.Supported=1).
- The translucency mask generated improves TSR thin geometry quality.
Motion Vector World Offset (Per-Pixel): Works similar to the Previous Frame input of Previous Frame Switch but in the pixel shader. Regions with invalid velocity will be approximated with the current frame’s offset.
- This function currently supports non-nanite meshes only in non-basepass velocity write and is implemented in two passes. Use r.Velocity.PixelShaderMotionVectorWorldOffset.Supported to enable it.

TSR:

Added an option to allow all shading models to use TSR thin geometry detection when r.TSR.ThinGeometryDetection.Coverage.ShadingRange=2.
Added exposure offset (r.TSR.ShadingRejection.ExposureOffset) back so global darkening ghosting can be improved. The value behaves a little differently and is now used to adjust the exposure offset for the reject shading moire luma and guide history. Any legacy project before UE5.5 using the old CVar should consider adjusting the value.

Added a new debug artifact (ShaderTypeStats.csv), dumped by default for all cooks to the ShaderDebugInfo folder for each shader format.

This CSV file contains permutation counts/code sizes for all shaders in the shader library, grouped by shader type.
Note that this is not directly representative of final shader memory usage since it doesn’t account for potential duplication of bytecode introduced by the pak chunking step (where shaders used in multiple pakfiles will have a copy in each).
This is only intended to be used as a tool for tracking and comparing shader growth over time or between cooks.

Niagara

Removed UNiagaraEmitter from stateless emitter on cook. (Also known as Lightweight Emitters)

The cooked build does not require them, various parts of the UI still make assumptions and will be cleaned up.
This saves ~4k per stateless emitter.

Use the component transform for local space vs the simulation one - When disabling requires current frame data (for performance) these could be out of sync, which mismatches other renderers on the RT (i.e. sprites & meshes)

Disabling RequiresCurrentFrameData is an optimization checkbox that allows for better scheduling of the Niagara emitter ticks.

PCG - FastGeometry

PCG GPU now leverages FastGeometry components to further improve on game thread performances when using the framework to generate and spawn a high density of static meshes, such as ground scatter and grass. It removes the need for any partition actors and creates local PCG components on the fly.

To benefit from the improvement, enable the PCG FastGeo Interop plugin in your project and set the CVAR pcg.RuntimeGeneration.ISM.ComponentlessPrimitives to 1.

Mass Runtime

This update includes a series of low-level optimizations and architectural refinements to the Mass framework, aimed at improving overall performance, memory efficiency, and system stability in real-time scenarios.

The most notable addition is Processor Time-Slicing, which means long-running processors can be split across multiple frames. This helps reduce performance spikes and enables better distribution of heavy workloads, particularly useful in large-scale or simulation-heavy environments.

Rendering

Release/Recycle Virtual Texture spaces after they are unused for some time.

We don’t release immediately to avoid cases during loading where this might trigger unnecessary Recreate/Resize work.
The old behavior was to never Release/Recycle unless we ran out of space slots.
Added r.VT.SpaceReleaseFrames to control this. The default is to release after 150 frames. Setting to -1 will return to the old behavior.

FBIK and Retargeting Performance

We added performance improvements and control in Full Body IK and IK Retargeter. This provides a way for you to have performant retargeting at runtime.

Improve posing with FBIK without a heavy performance cost (~20% speed increase).
Use the stretch limb solver in IK Rig for performant runtime retargets.
Use the FK rotation mode Copy Local for faster rotation transfer per chain.
Enable performance profiling on the retarget stack.
Add a LOD threshold per retarget operator.

State Tree Runtime

Memory Optimization for Static Bindings: Delegate dispatchers, listeners, and property references are now stored outside of instance data, reducing per-node memory overhead.

AsyncNavWalkingMode can use navmesh normal instead of physics hit normal.

Now always take the highest hit location in z when searching for ground location.
Note: AsyncNavWalkingMode is a movement mode part of the new Mover 2.0 component and not available in the Character Movement Component.

NavMesh: Added a navmesh tile building optimization when using CompositeNavModifiers with lots of data.

Audio

Made memory improvements for MetaSounds in the Operator Cache.

Added a new Virtualization Mode: Seek Restart. This mode will virtualize, but keep track of playback time and, when realized, seek to that time as if it was playing the whole time. It’s a less accurate, but more performant alternative to Play When Silent.

Added an option to only support Vorbis decoding, not encoding, for memory-constrained targets.

Blueprint Runtime

Fixed a memory leak when using containers with types that allocate memory by default.

Mass

Added time-slicing with FMassEntityQuery::FExecutionLimiter to limit execution to a set entity count.

Added observers locking to commands flushing. This change results in batching observers calls, significantly reducing the number of individual observers executions (~20 times in CitySample).

Iris Networking

Iris will eventually replace Epic’s current Replication system. For those already using it in their projects, they made some more improvements.

Cleaner API Boundaries: We removed the Iris UReplicationBridge base class to reduce virtual call overhead and simplify code navigation. Most Iris systems already depend on UObjectReplicationBridge directly, so this change streamlines the inheritance model and avoids unnecessary indirection.

Implemented seamless travel support. If seamless travel support is not needed for a project the cvar net.Iris.AlwaysCreateLevelFilteringGroupsForPersistentLevels can be set to false to avoid unnecessary overhead of filtering objects in the persistent level.

Optimized polling to only poll objects/properties that are marked dirty.

NetEmulation:

Added netEmulation setting to simulate buffer bloat on game traffic.
Set PktBufferBloatInMS for outgoing packets and PktIncomingBufferBloatInMS for incoming packets.
- Ex: ‘netEmulation.PktBufferBloatInMS 1000’ will apply 1sec of buffer bloat on outgoing traffic.
netEmulation.BufferBloatCooldownTimeInMS can also be used to set a cooldown period between each buffer bloat occurrence.
- Ex: ‘netEmulation.BufferBloatCooldownTimeInMS 2000’ means buffer bloat is not applied for 2secs after a buffer flush.
Added a “BufferBloat” emulation profile that enables buffer bloat of 400ms (in and out) over an ‘average’ emulation profile.

Added metrics to measure the performance and behavior of the multi-server proxy.

Multithreaded Iris Polling Step:

FObjectPoller can now kick off multiple FReplicationPollTasks, each of which processes a subset of the ObjectsConsideredForPolling array.
This is set to process cache line sized Chunks in an interleaved pattern so that each task gets a roughly balanced amount of work, and we avoid false sharing the same cache line.
Added capability for Networking Iris Polling phase to be run in parallel. This can be enabled by adding bAllowParallelTasks=true to the ReplicationSystemConfig

Procedural

Fixed multiple performance bugs where GPU-resident data is unnecessarily read back when passing through Change Grid Size nodes.

Added LLM tags for tracking memory allocations.

Added a debug feature that repeatedly dispatches the compute graph to enable profiling, compiled out by default (PCG_GPU_KERNEL_PROFILING).

Added specific graph cache enabled and budget CVars for editor worlds vs game worlds.

Platform Android

Added -FastIterate flag to Visual Studio/Rider/Quick Launch, to have libUnreal.so outside the APK, and made build iteration faster in general.

Now starts loading libUnreal dependencies early for faster startup time

Landscape Editing

Landscape optimization for the editor: No longer inspects materials for finding mobile layer names when generating mobile weightmap allocations while there’s no weightmap allocations in a given landscape component in the first place.

Performance optimization when undoing landscape edits.

Improved landscape.DumpLODs command: Now it only displays the landscape LOD information by default and -detailed needs to be used to get the verbose mip-to-mip delta information.

Remove redundant and unsafe non-blocking landscape Nanite build from PreSave. No longer builds Nanite at all on auto-saves.

Dev Tools

Improved the summary outputs of UnrealPak. The units of memory now scale with the number of bytes making it easier to read for both smaller and larger containers.

Build

Prevented Clang builds for Windows from launching multiple link jobs simultaneously to avoid memory exhaustion.

ThinLTO is now enabled by default for Clang based toolchains.

Note: “LTO (Link Time Optimization) achieves better runtime performance through whole-program analysis and cross-module optimization. However, monolithic LTO implements this by merging all input into a single module, which is not scalable in time or memory, and also prevents fast incremental compiles.” - Clang docs

Added support for AVX10.

Interchange (Asset Importing)

Reduced memory usage when importing static meshes & skeletal meshes.

Textures imported with non-power-of-2 dimensions or non-multiple-of-4 dimensions were previously automatically set to NeverStream and Uncompressed. This can now be toggled with a config setting :TextureImportSettings::bDoAutomaticTextureSettingsForNonPow2Textures .

Also when textures initially imported as nonpow2 are re-imported as pow2, they now get their settings automatically fixed to the defaults for pow2 textures.

Implemented a Nanite triangle threshold value on the Interchange generic asset pipeline, in order to enable Nanite only for meshes past a triangle count size.

UI

Significantly improved the Content Browser column sort performance for large search results.

Incremental Cooking

Incremental Cooking is now beta, but not immediately clear what improvements made it into this release.

Build Health Tracking and Visualization (Experimental)

https://dev.epicgames.com/community/api/documentation/image/7a3307ca-81a8-42b0-ae1d-2500533774cf

As part of the Horde analytics tooling, we’ve introduced a new Build Health dashboard experimental feature that gives teams a way to monitor and inspect which BuildGraph steps across the projects change lists are completing as expected, and/or reporting errors.

This is part of an ongoing development to provide built-in functionality in Horde to help teams better understand the most common cause of build failures, monitor pressure on agent pools and other useful build performance metrics that impede your iteration times.

Closing

This time around I omitted a lot more of the smaller performance wins to keep it a bit shorter and focus on the most impactful changes and the changes you should be aware of when upgrading engine versions.

If you are you interesting in learning more about game performance optimization, I have a professional training course used by dozens of AAA studios. Get more information here or reach out directly for more information about team enrollment and studio training.

You may follow me on Twitter/X, or LinkedIn for everything Unreal Engine performance related!

Unreal Engine 5.6 Performance Highlights

2025-07-16T00:00:00+00:00

The release of Unreal Engine 5.6 brings a lot of incredible performance improvements to the engine. In this alternative release notes I have filtered the list down to the most interesting optimizations and performance related changes. Where appropriate I have added my own notes, to explain more clearly or give context as these notes can sometimes be rather vague or short.

This is not every single optimization that made it into 5.6, instead primarily those that I think you should be aware, might change previous assumptions or require manual changes or CVAR tuning. You can find the original full release notes here.

Summary

The overall biggest highlights of this release are continued improvements to Unreal 5’s core render features of Nanite, Lumen (especially HWRT) and Virtual Shadow Mapping which are valuable to almost everyone. Further major improvements to renderer parallelization and other systems going more and more multi-threaded. World streaming with a new experimental plugin developed with CDPR and other scattered mentions of removed hitches and stalls from a variety of render related sources. A huge overhaul of the GPU Profiler and Insights GPU stats which add major enhancements to how we profile and reason about GPU performance.

Unreal Insights

Added “ConsoleCmd” CPU scoped trace event for console input processing (includes the console command execution). The CPU timer has the actual command string as metadata.
Improved the debug stats/counters for TraceLog:
- Added defines in TraceAuxiliary.cpp to easily enable/disable code for STAT, TRACE, LLM and/or CSV stats/counters.
- Added additional debug counters/stats:
  - “Cache Unused”
  - “Emitted not Traced”
  - “Memory Error” (total - block pool - fixed - shared - cache).
- Added also trace counters API stats and made it default (instead of STAT counters).
- Added block pool, fixed buffers, shared buffers, emitted and traced stats also to the output of “trace.status” console command.
- Added initial zero values to all stats/counters (improves graph display in Insights for all the stats).
- Simplified code re registering EndFrame callbacks.
- Added support for registering a callback to be called each time TraceLog updates: added OnUpdateFunc in UE::Trace::FInitializeDesc + added UE::Trace::SetUpdateCallback(func).
- The new trace update callback is now used to emit stats/counters also during engine initialization. Once the engine finishes the initialization, the stats counters will be emitted only once per frame (by resetting the update callback and further emitting stats from end frame updates).
CpuProfilerTrace: (”cpu” trace channel)
- Added support for FName variants of CpuProfilerTrace macros:
  - for TRACE_CPUPROFILER_EVENT_SCOPE_STR (when the provided event name does not change)
  - for TRACE_CPUPROFILER_EVENT_SCOPE_TEXT (when the provided event name is dynamic).
- FEventScope and FDynamicEventScope can also be initialized now with an FName.CpuProfilerTrace:
  - Added variations for _STR macros (_STR_ON_CHANNEL_CONDITIONAL vs. _ON_CHANNEL_STR_CONDITIONAL vs _ON_CHANNEL_CONDITIONAL_STR).
Add ability to show callstacks for bookmarks in Unreal Insights. Enable callstack,module,bookmark channels to gather this information from the tracing target.
Add optional profiler hooks in TraceLog, and instrument common worker thread functions.
Stats: Support flags in lightweight stats
- This re-uses existing declaration macros (that were previously only utilized when STATS is 1) to define lightweight structs containing information about the stat such as the flags and group.
- This allows us to do things like properly filter out verbose stats when profiling in the Test config (where STATS is 0).
Added tracing for all LLM tag sets (assets and asset classes).
CounterTrace: Added more options to set value for a trace counter:
- TRACE_COUNTER_SET_IF_DIFFERENT(CounterName, Value)
- TRACE_COUNTER_ADD_IF_NOT_ZERO(CounterName, Value)
- TRACE_COUNTER_SUBTRACT_IF_NOT_ZERO(CounterName, Value)
- TRACE_COUNTER_SET_ALWAYS(CounterName, Value)
- TRACE_COUNTER_ADD_ALWAYS(CounterName, Value)
- TRACE_COUNTER_SUBTRACT_ALWAYS(CounterName, Value)
- These can be used no matter how counter is created (“checked” or “unchecked”).
Allow platforms to filter modules included in callstack tracing.
Timing Regions now support an optional second parameter to specify a category (Regions macros and FMiscTrace::OutputBeginRegion). This allows filtering and grouping in the Insights Frontend.
CsvProfiler - Report explicit event wait names as trace scopes.
MemoryTrace: Fixed memory tracing to enable tracing of tags and mem scope with “memalloc” channel (and not with “memtag” channel which is used to enable tracing from LLM).
Remove FName block allocations and TextureManager construction from contributing to the MemQuery/Memtrace asset memory cost readings, increases the precision of per-asset memory consumption reports.
Marked the Profiler* modules as deprecated. The old Profiler has actually been deprecated since UE 5.0, superseded by Trace/UnrealInsights.
Deprecate TraceDataFilters plugin because the functionality has been moved to the Trace Control Widget from Session Frontend and the Live Trace Control Widget from Unreal Insights.
Audio: Added Unreal Insights asset scopes for MetaSound building and rendering memory use.
Mass: Added Mass trace channel for MassEntity Unreal Insights plugin
Adds a data stream type for Trace analysis which allows a runtime to connect directly to the analysis session, bypassing UnrealTraceServer. The session is analyzed directly in memory and not stored on disk. The direct socket connection is using a different port than the standard networked trace connection, to avoid collisions with UTS. It is only available for local connections and in the same process.

Insights Asset Memory Profiling (Experimental)

In Unreal Engine 5.6, Insight Profiling introduces a new (experimental) Low Level Memory (LLM) tracing of assets within your projects. Launch your client with the appropriate arguments to enable asset memory tracing on your game client. The functionality includes:

The ability to define per platform memory budgets per asset type- See the LLM Timeline / TagSet TreeView.
Switch analysis between TagSets, specifically the System, AssetClass, and Asset TagSets.
Sort TagSets entries by name and size
See all the entries and associated budgets per TagSet, anything out of budget is clearly indicated.
A/B comparison of memory usage from one frame to another.

GPU Profiler 2.0

Unreal Engine 5.6 introduces a re-architected Insights GPU profiler.

The goal is to unify existing profiling systems within the engine (Stats, ProfileGPU, Insights) to use the same data stream increasing its reported accuracy and consistency when profiling a scene.

We overhauled the way timestamps are collected for the GPU timeline and have all the RHI’s produce this information in a common format. The new event stream and Insights API improvements surface more information in the new GPU profiling tools:

async queue, multi GPU, pipeline bubbles (GPU idle/busy states), cross-queue dependencies (fence waits)

This overhaul does remove the convenient in-editor ProfileGPU UI pop-up unfortunately. Thankfully, Epic massively improved the detail of the log dump during profileGPU command to compensate.

Within Unreal Insights you’ll find the biggest improvements as you now see two GPU tracks one for Graphics and another for Compute which is HUGE for understanding your GPU performance. In pre-5.6 you couldn’t properly reason about the GPU stats as things such as Async Compute were not displayed properly. Additionally, stat GPU would not clearly differentiate what runs as async compute either, making the stats somewhat misleading and difficult to reason about.

I’ve included examples of the new stat GPU and log output from ProfileGPU console command.

Renderer Parallelization

Render Thread performance is very often the limiting factor for UE titles. This is because previously some operations were restricted to this particular thread, even though current platforms and graphics API provide methods for them to be done in parallel. We improved performance by refactoring the Renderer Hardware Interface (RHI) API to remove these constraints and fully utilize the multithreading capabilities of the target hardware.

Virtual Shadow Maps Optimizations

Virtual Shadow Maps in Unreal Engine 5.6 further improves on shadow performance and memory usage with optimized scene culling while increasing fidelity and artistic control.

Detailed Changes

Implemented VSM per-instance deforming state tracking on GPU such that we can know when an instance switched state on the GPU and trigger invalidation.
Added receiver masks that can improve clipmap culling effectiveness significantly for dense scenes, especially with a dynamic light. Disabled by default and can be enabled using r.Shadow.Virtual.UseReceiverMask. There’s a potential for artifacts when used with r.Shadow.Virtual.MaxDOFResolutionBias.
Added per-chunk aggregate of shadow casting flag and early out in the chunk culling to improve culling efficiency.
Fixed the pre-cull instance counts for nanitestats virtualshadowmaps and optimized the loop iteration by using bit logic to skip empty bits.
Set clipmap far culling plane to just beyond the visible range when force invalidate is enabled, greatly reducing rendered geometry in some cases. Can be disabled using r.Shadow.Virtual.Clipmap.CullDynamicTightly (default true).
Skip queuing cache invalidations for dynamic geometry when receiver mask is enabled
Update some VSM options for best current paths:
- Remove ability to turn off StaticSeparate as it is fairly integral to how the fast path caching logic works now.
- Ensure r.shadow.virtual.cache 0 also enables the fast “uncached” paths on the nanite view
- Remove mode “1” for VSM nanite and non-nanite HZB. Only two pass occlusion culling on the current frame is supported. Not yet enabling receiver mask by default due to some memory overhead, but that is the intended path moving forward.
- I cannot find r.Shadow.Virtual.Cache.StaticSeparate any longer in 5.6, so this might be forcefully enabled from now. That does impact memory usage as it doubles it compared to having it disabled.
Support receiver mask for local lights (disabled by default, see r.shadow.virtual.usereceivermasklocal)
- Requires using non-greedy mip selection when enabled for SMRT and single sample lookups
- Cvar to enable/disable VSM overflow screen messages (enabled by default)
- Add force invalidate local cvar
- Remove greedy selection clipmap cvar; has been off for a while and increasingly clashes with the dynamic geo optimization strategies (receiver mask, etc)
Optimization to skip/simplify VSM light grid pruning when there are no VSM local lights
VSM: Support receiver mask with caching enabled.
- When enabled via r.Shadow.Virtual.UseReceiverMaskDirectional, receiver mask will be used for all dynamic pages (which will thus become uncached as it is unsafe to cache partial pages), but static pages can remain fully cached.
- This is generally a benefit to most applications as relatively static objects should migrate to the static pages, while dynamic pages are frequently invalidated every frame.
VSM Dynamic Z range tight culling when using receiver mask
Normalize a few default High scalability VSM settings across platforms.
- This ties back in with the new 60Hz Scalability Profiles that got tweaked this release.

Lumen

Lumen Hardware Ray Tracing Optimizations

In Unreal Engine 5.6, Lumen Hardware Ray Tracing (HWRT) mode delivers even greater performance on current-generation hardware. These low-level optimizations ensure faster, more efficient rendering, bringing high-end visual fidelity and scalability that now matches the frame budget of the software ray tracing mode. This frees up valuable CPU resources on your target platform and overall helps achieve a more consistent 60hz frame rate.

Detailed Changes

Lumen Scene Update CPU performance optimizations
ShortRangeAO is now running at half resolution with a denoiser. This makes it two times faster on console
Lumen Far Field Optimizations making Far Field 30% faster on console and ~50% when using new occlusion only far field mode (Added r.LumenScene.FarField.OcclusionOnly 1)
Added Single Layer Water reflection downsampling and denoising. Downsampling allows to scale down water reflections and denoising allows to make them more stable.
Reworked Lumen Reflection radiance cache. It can now sample sky directly for higher quality (r.Lumen.ScreenProbeGather.RadianceCache.SkyVisibility) and has better controls where it needs to be applied
- Unclear whether this has associated performance gains with it, but it sure sounds good.
Implement basic SER support for HitLighting and ray traced translucency This can improves performance on hardware that supports SER on scenes that use hit lighting and have many materials. This feature is controlled by the cvar: r.Lumen.HardwareRayTracing.ShaderExecutionReordering
Lumen Surface Cache update is now driven by distance to frustum. This allowed to half number of updated pages making it two times faster without a large impact on the GI update speed on High scalability
Disable DistanceFieldRepresentation bit when HWRT is used with Lumen. This saves ~0.07ms (1080p, console) on skipping CopyStencilToLightingChannelTexture and skipping reading this bit during ScreenProbeGather and ShortRangeAO tracing.
Added r.Lumen.ScreenProbeGather.IntegrateDownsampleFactor.
- It allows downsample Screen Probe Gather integration which makes this pass ~3 times faster (~0.3-0.5ms speedup on console 1080p, depending on the content).
- Downsampled integration is pretty stable thanks to jittered and irregular sampling patterns, upsampling based on depth and normal, and full resolution temporal accumulation.
- The downside is that it does remove some of the fine grained normal detail making it blurry, so for now it’s not enabled by default
Wew ray weighting for Lumen Reflections. It improves reflection stability on some features and speeds up reflection pass
Lumen performance visualization view now uses different color brightness for different roughness ranges
Lumen Screen Probe Gather fast out optimizations for quickly skipping sky pixels.
New Lumen Screen Probe Gather adaptive probe placement algorithm. Major GI optimization due to ability to place less adaptive probes while retaining similar visuals
Lumen Surface Cache Radiosity pass optimizations
Reduce Lumen Reflections output format to 32 bits. This saves 0.02ms in Lumen Reflections and 0.03ms in Water Rendering on console at 900p
Changed Lumen default settings.
- SWRT detail traces (mesh SDF tracing) is now a deprecated path, which won’t be worked on much.
  - Important note! It’s also really important that Lumen eventually reaches a single path so developers can focus on HWRT entirely and not re-author content for multiple configurations.
- For scaling quality beyond SWRT global traces it is recommended to use HWRT path instead Firefly filtering is now more aggressive by default (r.Lumen.ScreenProbeGather.MaxRayIntensity 10 instead of 40).
- This removes some interesting GI features, but also reduces noise, especially from things like small bright emissives.
Fixed async compute overlap when async Lumen reflections are enabled
Fixed Lumen Radiance Cache cache update time splicing causing major performance spikes on fast camera movement or disocclusion

Nanite

Added explicit chunk bounds to instance culling hierarchy and used those to make it possible to store and update only the bounds for the dynamic contents of a cell. This improves performance in CitySample by some 100us (which had regressed when we switched dynamic geometry to be “cullable”).
Fixed distance culling bug for Nanite rendering into CSM, where it would not set up the culling view overrides correctly leading to issues with e.g., per instance cull distance (foliage).
Added specialization for single-view case (visbuffer) for the chunk based instance cull shader allowing the compiler to remove the loop and significantly lower register pressure.
Added specialized instance cull for static geometry to reduce the cost as well as register pressure. Disabled by default due to potential failure modes (r.Nanite.StaticGeometryInstanceCull).
Added aggregate instance draw distance and culling to hierarchy cells and chunks. This can be a significant win in scenes with many small instances that use per-instance culling ranges.
- Not entirely clear, but this likely affects foliage rendering the most as normal nanite meshes don’t support max draw distance or similar.
Changed hierarchical instance culling to work on chunks of 64 instances rather than on cells and support culling GPU-updated instances. Significantly improves instance culling in many cases, especially scenes with large amounts of GPU-generated PCG instances.
Bugfix: Fix regression of LOD generation (when Nanite is enabled) that would allow meshes with low poly counts to simplify too much in generated LODs and destroy their silhouettes.
- Perhaps time to verify all your fallback meshes (used for collision geo, lumen HWRT among other things!) that none have lost their silhouettes.
Updated Nanite Software VRS to respect the per-material “Allow Variable Rate Shading” checkbox.

Lighting

Ray Tracing: Add debug visualization for flags and masks Add new picker domains to visualize the unique flag and mask settings throughout the scene. This can be used to validate which modes are being used. In particular this is useful to track down materials that require any-hit shading.

MegaLights

Exposed r.MegaLights.DownsampleFactor, which allows to change between 1x and 2x donwsampling factor for scaling quality up

Niagara

Add a mesh LOD option to use the component origin as the source of the LOD calculation
- This is more stable if you have dynamic bounds and allows for a single mesh to be rendered vs per particle LODs
Enable Niagara Editor Performance Stats for new VM
- The new VM doesn’t breakdown per module and will only display information for each stack group
- As far as I know, Niagara has been working on a new Virtual Machine for math calculations (The VM allows the same math/code to run on both CPU and GPU which makes swapping these modes so seamless for us as devs).
When using performance mode disable “compile for edit” to ensure performance measurements is more accurate
- One measurement I made was ~40% delta between edit mode and none edit mode
- There is a separate option to enable edit mode for profiling

Niagara Heterogeneous Volumes

Unreal Engine 5.6 is production-ready and brings further optimizations in downsampling and runtime performance on PC and 9th generation consoles.

Bilateral upsampling is now employed when rendering at downsampled resolution.
Expensive operations such as evaluating fog in-scattering and indirect lighting have been approximated to lower VGPR pressure and tighten the main ray marching loop.
Calculation of indirect lighting is optionally performed within the lighting cache calculations to reduce ray marching complexity and lower VGPR usage.
Fog in-scattering is optionally lifted out of the main ray marching loop and interpolated to improve real-time performance.
Hardening of the Heterogeneous Volume component allows for more robust operation when running in-game.
Beer Shadow Maps are optionally employed when mixing with translucent rendering; an approximation but more performant for real-time applications

Rendering / RHI

Stat unit now displays the current render resolution
- This is actually useful to see reported clearly in viewport.
Changed the shadow fade out to depend on the shadow resolution scale to give greater control over how individual lights fade out their shadow, the old behavior can be enabled by turning off “r.Shadow.DoesFadeUseResolutionScale”.
- Looking at the code, this is for non-virtual shadow mapping.
Allow light culling to be run on async compute.
Made the light grid feedback pass also run on the async queue to prevent it from causing a sync.
Quantized Automatic View Texture Mip Bias to significantly reduce the number of independent sampler states used when Dynamic Resolution is enabled, preventing crashes in long-running applications that hit the sampler limit.
- Number of quantization steps is controlled with r.ViewTextureMipBias.Quantization (defaults to 1024).
Added support for reserved resources for the GPU-Scene instance data buffer to remove the need to perform copies on grow/shrink, reducing hitches and peak GPU-memory use. Disabled by default, may be enabled using e.g., r.GPUScene.InstanceDataTileSizeLog2 12 on supporting platforms.
Add per-group auto LOD bias for finer control.
TSR- Improve the temporal stability of thin geometry like foliage and hair rendering (controlled by r.TSR.ThinGeometryDetection, off by default). It should largely reduce patch flickering.
- Use r.TSR.Visualize 15 to visualize edge line (red), partial coverage (green), and others (yellow). Only red and green regions have stability improved.
DX12: Fix a case where a temporary staging texture free wasn’t being traced, which resulted in Memory Insights reporting ever-growing memory usage.
Fix memory leak due to unreleased ref. count on skeletal mesh LOD data.
PSO: Fixed a bug in PipelineStateCache::GetAndOrCreateComputePipelineState that would trigger an unnecessary stall on the render thread. Precached PSOs should not be added as a dispatch prerequisite on the RHI command list, since they aren’t used for drawing.
- Additional PSO improvements have been made, but this mentioning fixing potential stalls sounded the most interesting.

Materials & Shaders

ListShaders console command added, similar to ListTextures, for runtime analysis of shader memory / loads.
Change default value of r.VT.PageFreeThreshold from 60 to 15. This has been seen to be a better default setting on a range of internal projects. It reduces pool residency on fast camera movement which can slow down page production causing pop in.
Change default Virtual Texture behavior in cooked builds to not wait on render thread for root pages to be mapped. Instead we read from an average fallback color generated during texture compilation. This removes render thread hitches.
Materials now can opt themselves out of Static Mesh vertex factories. This is an advanced option, defaults to true, and can be used to reduce vertex factory compilations and memory. Useful in UI, Niagara, Skeletal Mesh, and other materials when a material will not ever used on a Static Mesh.
Added a new cook artifact (shadertypestats.csv) that can be used for more granular tracking of shader/shader type growth over time.
- It contains, for each shader type, both the number and total memory size of all unique shaders, prior to chunking.
- Note that isn’t directly representative of final shader memory/disk size since it doesn’t account for shader library deduplication, or shader library chunk re-duplication (the case where multiple shadermaps which would have shared bytecode end up in different chunks).
- This artifact is saved to both the root of the ShaderDebugInfo folder for each cooked shader format, and also renamed to include the shader format name and saved in the Metadata folder for a cooked build.
Add CPU distance based streaming for virtual textures. Virtual textures can opt into having mip levels streamed using the existing regular texture streaming logic. A budget is configured using r.Streaming.PoolSizeForVirtualTextures.
Add a VirtualTextureStreamingPriority setting to TextureLODGroup and Texture assets. We use it to prioritize when collecting virtual texture pages to populate.

RHI - Bindless Resources

Bindless resources are a low-level feature related to the management of textures and other types of data buffers in modern Renderer Hardware Interfaces (RHIs) such as DX12, Vulkan, and Metal. We added support for bindless resources to provide the means for more flexible GPU programming paradigms and additional features within the renderer, and as a requirement for full ray tracing support on Vulkan.

While not a direct user-facing feature, support for bindless might be of interest to some users writing C++ plugins and custom engine modifications relevant to rendering.

Optimized Device Profiles for 60Hz

Unreal Engine 5.6 provides out-of-the-box, up-to-date device profiles that reflect Epic Games’ Fortnite-optimized settings to achieve 60fps on all supported platforms.

Procedural Content Generation (PCG)

There is a a lot of optimizations in the release notes that you may want to dive into (See PCG Section) when you heavily use PCG.

Added fine grained time slicing to compute graph dispatch to help stay within execution budget.
Optimized dispatch of GPU graphs to reduce game thread cost.
Optimized runtime generation to reduce game thread cost.
GPU Grass & Micro Scattering: Added support to PCG GPU compute for sampling the Landscape RVT and Grass maps directly on GPU in order to build efficient and customizable runtime grass spawning.
GPU Compute Performance Improvements
Reduced overall memory consumption when working with PCG.
Improved the PCG Framework execution efficiency for both offline in-editor and runtime use cases.
New Experimental PCG Instanced Actor interop plugin to spawn and take advantage of the instanced actor system.

Animation

Optimize RigLogic for low LOD evaluation (targeting low-power devices), bringing a 30-40% performance improvement
Optimized RichCurve evaluation
Optimized skinned mesh proxy creation
Added memory usage estimate during animated curve compression
Bugfix: Fix missing fast path icons for blend space nodes
Added option to disable animation ticking when a skeletal mesh is directly occlusion/frustum culled
Fix incorrect comparison of VisibilityBasedAnimTickOption in AnimationBudgetAllocator.
Allowed post process Animation Blueprints to be applied based on a LOD threshold per-component
Fixed OnlyTickMontagesAndRefreshBonesWhenPlayingMontages option updating the anim instance multiple times

World Streaming

A MAJOR improvement for world streaming called Fast Geometry Streaming Plugin is now Experimental in 5.6. This is a huge collaboration between Epic and CDPR to fix some long standing issues with streaming in large and complex open worlds. The full notes contain a LOT of detailed settings and CVARs to enable, so I’ll link directly here. Instead I’ll list some highlights that you should definitely look into.

Allows tuning of streaming budgets per frame for things such as AddToWorld, RemoveFromworld and mesh streaming. (See full notes on what budgets they used for CitySample)
Asynchronous physics state creation/destruction.
- This is huge for relieving the GameThread during streaming/spawning of actors.
Improved RemoveFromWorld or “Incremental EndPlay”.
Unified/Shared Time Budget for ProcessAsyncLoading and UpdateLevelStreaming

This is a major step towards excellent (open world) streaming that I am excited to see evolve into Production Ready as soon as possible.

Chaos Core

Epic worked on the following Core Solver optimizations:

Partial sleeping islands
Scene query improvements
Simulation initialization improvements
Multithreaded collision detection and solving
Multithreaded island generation
Network physics development

There wasn’t a lot of specific perf related details on the above improvements. Here are some misc. interesting notes I found:

Added a cvar (p.Chaos.PreviewWorld.DebugDraw.Enabled) to allow enabling/disabling chaos debug draw on preview worlds
Added experimental asynchronous execution of Dataflow graphs (It can be toggled on from the Evaluation button options menu)
Enable some p.Chaos cvars in Shipping so they can be used to tweak game-specific performance.
Physics Replication LOD (Experimental)
- Interface that allows for custom LOD solutions.
- A base implementation of an LOD solution, disabled by default. Transitions replicated physics objects between replicated timelines and replication modes (predictive interpolation to resimulation) based on distance from focal points / particles.
- Project Settings and CVars for settings (p.ReplicationLOD)
- API to register focal points / particles in the LOD via AActor, UPrimitiveComponent or directly via the LOD interface on both Game Thread and Physics Thread.
- Option to register autonomous proxy in LOD via the NetworkPhysicsSettingsComponent.

Core

Add an experimental thread sanitizer that can be run on the Windows platform.
- Thread Sanitizers are helpful in debugging data races in multithreaded programming. There is no further info on this at the time…
Add LoadAsync wrappers to TSoftObjectPtr and TSoftClassPtr to better expose it as an alternative to LoadSynchronous
Add a local queue in the scheduler for the game thread to improve its task queuing efficiency
- Expect this to improve TaskGraph code which also runs to handle all Ticks etc. on the GameThread.
FileHelper: Add LoadFileInBlocks for TryHashFile and other similar performant reading of files.
WorldPartition::GetStreamingPerformance() now reports streaming performance also for non blocking cells/sources.
- Poor streaming performance caused by blocking cells/sources take preference over poor streaming performance from non blocking cells/sources.
- The thresholds for reporting streaming performance from non-blocking sources can be configured via “wp.Runtime.SlowStreamingRatio” and “wp.Runtime.SlowStreamingWarningFactor”.
- GetStreamingPerformance also reports an additional enum for streaming performance (“Immediate”) that triggers when sources are within unloaded cell content bounds.
Added a CVar: LevelStreaming.LevelStreamingPriorityBias that can be used to offset level streaming LoadPackageAsync request priorities (added to the levels calculated StreamingPriority).
Add Async.ParallelFor.DisableOversubscription (defaults to false) cvar which specifies if parallel for requests can spawn additional threads when waiting for spawned tasks to finish.
Fix performance regression for the loader at runtime that might check for file existence on disk even for packaged builds
Added a way to override the EAllowShrinking defaults. (TArray)
Added a new define UE_UOBJECT_HASH_USES_ARRAYS that replaces FHashBucket in UObjectHash implementation with a one that uses arrays instead of sets. This reduced memory consumption by ~20%. Please note that this might reduce performance in some corner cases. This will be addressed in a future engine update.

Mover Plugin: Performance Improvements

To scale with large crowds and demanding gameplay scenarios, we’ve introduced threaded simulation that enables the Mover plugin to run asynchronously off the game thread. Input gathering and simulation are now decoupled, enabling concurrent movement updates across many actors.

While this threading model is currently limited to single-player contexts, it lays important groundwork for future support in networked scenarios. The Mover plugin continues to evolve as a flexible, high-performance solution for both player and AI movement.

Mover 2.0: Adding NavWalkingMode to support more efficient walking movement for actors using a navmesh.

Mover 2.0 has other performance improvements in this release that I have omitted from the highlights for nav walking, state caching and memory usage. Search for “Mover:” in the original release notes for more.

Gameplay

Improvements to the performance and thread safety of the tick.AllowBatchedTicks system first added in 5.5.
- This system is intended not to batch the same tick functions, but instead batch tick functions inside single TaskGraph tasks to reduce the overhead of that system instead.
Change ticking to use ProcessUntilTasksComplete to periodically call an update function while waiting for tasks on other threads. Added tick.IdleTaskWorkMS cvar to control this, if > 0 the game thread will spend that many milliseconds trying to process other work (like worker thread tasks) when the game thread is idle
Add an optional deferred component move handler (s.GroupedComponentMovement.Enable) on the UWorld to allow scene components to request movement to be propagated later on the frame as a larger group of updates to help improve performance.
- This sound incredibly useful to automatically defer FScopedMovementUpdate calls to be batched together later in the frame.
- Character & Projectile Movement Components do not use this grouped update behavior at this time as it requires specifying the scoped movement with EScopedUpdate::DeferredGroupUpdates instead of the current EScopedUpdate::DeferredUpdates
Add LevelStreaming.VisibilityPrioritySort to change the order that it processes level streaming adds and removes
Added a new (experimental) TaskSyncManager to the engine which allows registration of globally accessible tick functions that can be used to synchronize different runtime systems and efficiently batch per-frame updates.
Added support for manual task dispatch to FTickFunction so functions can wait to be triggered halfway through a tick group. If tick.CreateTaskSyncManager is enabled, it will create the manager at startup and register sync points that are defined in the Task Synchronization section of project settings. RegisterWorkHandle can be used to request work at a specific sync point, and RegisterTickGroupWorkHandle can be used to request work to run on the game thread during a tick group. The TickTaskManager was modified to support this system and other methods for improving tick performance
Introduced a shared time budget for ProcessAsyncLoading and UpdateLevelStreaming that can be enabled with s.UseUnifiedTimeBudgetForStreaming 1. When this is set, it runs the async asset and level streaming at the end of the frame from HandleUnifiedStreaming which also handles high priority streaming. UpdateLevelStreaming will have less time if there are hitches in ProcessAsyncLoading, and time unused by UpdateLevelStreaming will be used to process more loaded assets.

Blueprint

Blueprints will now have their Tick function disabled if it’s empty.
- A nice to have automatic optimization that will reduce overall overhead when you have not done this basic clean up yourself.

Mass

Allow Mass phases to run outside the game thread
Mass can now optionally auto-balance parallel queries. This comes at a slight scheduling overhead, but can improve performance for processors that don’t have even performance for all their chunks.
Made instances with settings of MaxActorDistance == 0 never hydrate, including as a response to physics queries. This makes never-hydrating InstancedActors a lot cheaper. The new feature is disabled by default and is controlled by IA.EnableCanHydrateLogic console variable.
Made FMassProcessingPhase.bRunInParallelMode true by default.
- Also switched application of changes to mass.FullyParallel from taking place in FMassProcessingPhaseManager::OnPhaseEnd to FMassProcessingPhaseManager::OnPhaseStart

Game AI

Activate PathFollowingComponent ticking only when necessary
The call to OnActorRegistered in NavigationSystem from AActor::PostRegisterAllComponents will now handle registration for the actor and all its components. Component registration to the navigation system will be ignored until all components are registered to the scene to avoid extra work since actor registration will take care of it. The component specific operations are now only used after the initial registration (e.g., component added/removed/updated).
- This change fixes an issue exposed by the new registration flow when the NavigationElement is created and pushed with all the relevant data instead of relying on the callbacks when processing the registration queue. That delay was hiding the dependencies between some components.
Navmesh - Fix: only spawn the navmesh for the world that was loaded.

AI Smart Objects

Added partial multithreading support to be able to update/use/release slots from multiple threads. SmartObject instance lifetime is still single threaded and mainly controlled by components lifetime. The functionality is off by default and can be activated by setting WITH_SMARTOBJECT_MT to 1
- With it being off by default, you must compile the engine from source to enable this in 5.6.

State Tree

Multithread access detection. Detect if 2 threads are accessing the same instance data. The validation can be deactivate with the cvar StateTree.RuntimeValidation.MultithreadAccessDetector
Async RunEnvQuery Task
- Sound like we can now run environment queries asynchronously in the State Trees.

StateTree Scheduled Ticks and Performance

State Trees now support scheduled ticking, significantly reducing performance overhead by avoiding unnecessary updates. Instead of ticking every frame, State Trees will now only tick when needed — such as when a task requires it, an event occurs, or a delay completes. Tasks or states can also request specific tick intervals, enabling per-state throttling. Sleeping instances will automatically wake when relevant activity resumes.

This optimization can dramatically cut down CPU usage in complex games, especially when many trees are idle. Designers can see which states or tasks will tick through updated visual indicators in the editor and can toggle scheduled ticking per asset if needed.

StateTree: Asynchronous Task Support

We’ve expanded asynchronous task support through the WeakExecutionContext. WeakExecutionContext is lightweight and safe to copy. It provides a simple handle for async logic. When pinned, you have full access to instance data from within the async tasks, making it safe to read or modify task state asynchronously while preventing premature garbage collection. You are responsible for making its access thread safe.

Iris Networking System

Iris is the next-gen networking model of UE5 that is still in active development. This release includes foundational work such as:

More robust replication of dynamic and nested objects, with improved support for complex actors and FastArrays.
Improvements to network performance in high-bandwidth and low-latency conditions, with better bandwidth utilization and reduced replication latency for small objects.

Additional changes I found:

Added support for recording CSV stats in AActor::ForceNetUpdate and AActor::FlushNetDormancy. Recording is only enabled in Non-Shipping builds and when compiling for Dedicated servers only.
- The new CSV Categories (Actor_FlushNetDormancy and Actor_ForceNetUpdate) are disabled by default.
- By default we record the NativeParentClass name of the Actor.
- The CVar: net.Debug.ActorClassNameTypeCSV, controls which type of class name to record:
  - 0: Record the Parent native class name of the given Actor
  - 1: Record the TopMost non-native class name of the given Actor
  - 2: Record the Actor class name
Network Insights - Display all HugeObject packet contents.

UMG

Refactored widget animations to not use a player object anymore.
- This change removes the use of UUMGSequencePlayer except in places that require backwards compatibility. These player objects, and the IMovieScenePlayer interface, are getting deprecated in favor of more lightweight “runner” structs that can be easily packed in memory. A future optimization might even be to move all these runner structures into the UMG Sequence Tick Manager.

Editor & Development Iteration

Oodle updated to version 2.9.13
- From Oodle Release Notes: “This release focuses on BC7 and BC7-RDO encoding speed.” Expect 25-30% encoding speed which is an excellent improvement for development iteration.
Remove manifest from compiled DLLs as it was causing a delay for each module loaded to query WinSxS with an out-of-process call. This saved about 10s of boot time for ~2300 dlls loaded at editor startup.
- Just imagine the collective time this will save across all devs using UE5
Added non-shipping tracking for ResetAsyncTrace Delegates
- The delegate dispatch step inside ResetAsyncTraces is now Timed, and if it’s above a defined threshold, we dump some key info about those delegates to help diagnose rare performance spikes.
Improve Previewing Texture Streaming Pool in Editor - Get the correct PoolSize value from the DP cvar in Texture Streaming Stats - Exclude Editor Resident Memory from the Runtime Resident Memory in Texture Streaming Stats
Streamable Manager error handling - Add an Error field to FStreamableHandle. - Add FStreamableDelegateWithHandle to more easily find and inspect handles when a load completes.

Compile Time Improvements

Add support for debugging optimized code with MSVC. Please see https://aka.ms/vcdd for more details
ICX 2025.1 and Microsoft Clang 19 HWPGO integrations

Desktop

Bugfix: Execute One Command List at a Time on Async Queues with NVIDIA Hardware
- Some NVIDIA drivers may drop barriers at the beginning of command lists executed on async queues. This can result in visual corruption.
- As a work-around, execute each command list individually on async queues when NVIDIA desktop hardware is detected. This can limit the overlap of GPU work in some cases but avoids corruption.
- New cvars were added to control this behavior on all DX12 platforms and per queue type:
  - r.D3D12.Submission.MaxExecuteBatchSize.Direct
  - r.D3D12.Submission.MaxExecuteBatchSize.Copy
  - r.D3D12.Submission.MaxExecuteBatchSize.Async
- These are automatically configured during engine startup per the explanation above.

Windows

Fixed the underreporting of Windows D3D12 texture data in LLM (Low-Level Memory Tracker)

I ended up stripping quite some optimization and performance related notes as it became a massive list, which is in and of itself amazing to know that 5.6 received so many optimizations! The remaining list is still the most important bits you should know if you don’t have the time to read through the entire thing or if you needed some more context on a few of the often sparsely explained notes.

Additionally, certain features such as Substrate are entirely omitted from the highlights as I am personally waiting for this to be production ready before even bothering to look into this deeply.

Don’t forget to subscribe to my newsletter below to stay informed on Unreal Engine Performance & Optimization topics! And follow me on Twitter/X!

Animating in C++: Curves and Easing Functions

2025-01-07T00:00:00+00:00

There are plenty of ways to animate or interpolate things in Unreal Engine. The skeletal animation tools for example are incredibly powerful, but none of the available tools in Unreal are very lightweight or easy to use in C++. Especially for things that are not even skeletal meshes to begin with such as animating the radius of some gameplay ability, opening a treasure chest, or any other kind of value interpolation to use in your game code.

For a simple use case like opening of a treasure chest, we don’t want to use advanced animation tools such as Sequencer, Control Rig, skeletal mesh animations etc. We just want to interpolate between two values, ideally non-linear. For example, with a little bounce and the end or easing in/out of the transition. (The wobble at the end may be a little subtle in the recording, but it does add a nice touch in-game)

In this article I demonstrate a simple animation system implementation that you can expand on with additional features. The source code is available in my Action Roguelike project on GitHub (Direct Link to Implementation Example). This project is part of my Unreal Engine C++ Course, however its source code is open to everyone.

Note: This is not a step-by-step tutorial to code along. Instead this article explains the difficulties of animating in C++, proposes a solution and provides a walkthrough of the source code which is available on GitHub as part of the Action Roguelike project.

Problems with Animating in C++

The main issue with animating in C++ is there is no lightweight and simple API to set this up. You will need to do something like tick your Actor or Component every frame. Then apply either some math based animation or a curve asset to sample the next value. There is a lot of boilerplate to set up, especially if you want to disable this tick conditionally (eg. only tick when the animation is active, much like TimelineComponent does).

What about TimelineComponent?

The TimelineComponent is a pretty cool implementation with a unique Blueprint Node that makes it very easy to setup curve animations in Blueprint. It is not nearly as nice to use in C++, but more importantly it has other issues which we can improve upon. A couple of problems:

ActorComponent based which adds memory overhead, spawn/initialization cost, and additional garbage collection pressure if you use these a lot
Registers one new tick per component
Much better UX for Blueprint usage than C++
Not available in every context, with it being an ActorComponent (eg. if you want to animate something inside an non-Actor class like a gameplay ability)

The TimelineComponent also doesn’t support any math-based animations (easing functions) which we could easily add to our own animation system.

Writing our Animation Subsystem

Luckily, it is pretty straightforward to create a simple C++ animation system in a Subsystem in Unreal Engine. You could expand the provided sample to include additional easing functions and other math based animations such as spring damping, etc.

All we need to do is have the (tickable) subsystem play the animation for us, pass in some data such as a curve asset or easing function to use along with a callback function (lambda) to call every animation tick. This lambda receives the current animation value that we can apply to whatever we need such as the rotation of the treasure chest lid mesh.

Animating with Curves

Using Curve Assets lets us trigger and control the animation logic in C++ while allowing a designer in the Unreal Editor to fine-tune the animation. Below is an usage example in the RogueTreasureChest to open the “LidMesh” based on the curve animation.

URogueCurveAnimSubsystem* AnimSubsystem = GetWorld()->GetSubsystem<URogueCurveAnimSubsystem>();

// Curve Asset, playback rate, lambda to call each animation tick
AnimSubsystem->PlayCurveAnim(LidAnimCurve, 1.f, [&](float CurrValue)
{
    LidMesh->SetRelativeRotation(FRotator(CurrValue, 0, 0));
});

If you are unfamiliar with lambdas, they work a little bit like this:

[&] captures the values outside the function so they can be accessed inside the lambda. The ampersand capture is the “default capture by reference” for the data used inside the lambda. In our example the LidMesh must be “captured”. We can also specify specific variables, which will capture them as a copy instead of by-reference.
(float CurrValue) optional parameter(s), in our case the “CurrValue” is the value we get out of the curve asset. We use this value to drive the animation.
{ … } the body, it is the code that runs when calling the lambda inside the animation system.

The Curve Asset would look a little something like this to create a slight wobble at the end.

You can create a Curve Asset by right-clicking your Content Browser and selecting the Curve under Miscellaneous. You’ll be prompted for the curve type, the system currently supports only float curves.

To edit the curve, use middle mouse click to add new Keys.

The animation subsystem will manage the updates and removes the animation once finished. This alleviates some manual bookkeeping headaches. If you wish to manually Tick the animation anyway, it’s very easy to do so with the following code snippet as an example (taken from RogueTreasureChest.cpp):

// For manual ticking, you create the struct directly and keep it around, in FActiveCurveAnim* CurveAnimInst;
CurveAnimInst = new FActiveCurveAnim(LidAnimCurve, [&](float CurrValue)
{
   LidMesh->SetRelativeRotation(FRotator(CurrValue, 0, 0));
}, 1.0f);

void ARogueTreasureChest::Tick(float DeltaSeconds)
{
    Super::Tick(DeltaSeconds);

    // Example of manually ticking the animation, may be useful if you need the control and/or manually batch the specific anims
    if (CurveAnimInst && CurveAnimInst->IsValid())
    {
        CurveAnimInst->Tick(DeltaSeconds);
    }
}

Animating with Math (Easing Functions)

Alternatively to Curves, you can animate using math instead. The most common way to animate this way is with easing functions. This provides an even easier way to set up simple animations as you don’t even need to create or assign a curve asset in the editor.

Check out this excellent Easing Functions Cheat Sheet to help visualize the available easing functions.

Easing functions work simply by modifying how the Alpha value in the linear interpolation evolves over time so that it is no longer a linear function. (eg. if you would simply apply DeltaTime to the Alpha every frame).

// Example of linear
Alpha += DeltaTime;
FMath::Lerp(A, B, Alpha);
// Example of Ease Out
Alpha += DeltaTime;
float const ModifiedAlpha = 1.f - Pow(1.f - Alpha, Exp);
FMath::Lerp(A, B, ModifiedAlpha);

There is a lot to say about easing functions, but I’ll instead link to this excellent talk on the subject of animating with math… Math for Game Programmers: Fast and Funky 1D Nonlinear Transformations

Additional Tips & Tricks

Normalizing Curves

A quick tip is to consider setting up your Curves as normalized between 0.0-1.0 and apply any multiplication in the lambda/callback instead. This lets you re-use curves more easily and gives you a single value to set/tweak rather than shuffling around keys in the curve asset. Make sure that multiplication is exposed to Blueprint in case it needs to be fine tuned.

Math-based Animations

Unreal’s FMath has many more built-in functions to help animate in C++. The implementation example uses FMath::InterpEaseInOut, so check out that class (UnrealMathUtility.h) for more options or search for EEasingFunc as that’s the blueprint enum used to access the available easing functions.

Runtime Curves (FRuntimeFloatCurve)

There is another great curve type available if you don’t want to have many individual curve assets in your content folders. The FRuntimeFloatCurve type lets you set up the curve data straight inside the details panel! You still have the flexibility to assign a curve asset if you change your mind later.

Keep in mind that you’ll need to change the animation system slightly as it currently does not accept this type of curve. You could either overload the Play() function on the subsystem to support that type (and may require a new struct to hold its data). Alternatively you could try to store the animations using FRichCurve* instead as that’s the type inside of these curve classes that actually holds the keyframe data including FRuntimeFloatCurve, UCurveFloat, UCurveVector, UCurveLinearColor.

Closing

You now have a strong basis for creating animations in C++, driving a wide variety of systems. There is certainly more to implement such as looping, ping-pong playback, different value types (eg. Vectors and Colors). I will leave that up to you for now, maybe if you see this article in the future the subsystem will have be expanded already! You can find the source and other interesting C++ systems in the Action Roguelike project on GitHub.

To be notified of more C++ articles like this one, subscribe to the newsletter below.

References

Implementation Example Source Code available on GitHub
Math for Game Programmers: Fast and Funky 1D Nonlinear Transformations Love talks by Squirrel, this talk gives insight into using curves from math for animating things in-game
Easing Functions Cheat Sheet helps to visualize those easing functions
Huge resource on Easing function implementations if you wish to deep dive easing functions
Fresh Cooked Tweens - GitHub project by Jared Cook excellent resource to look for a more complete implementation

Unreal Engine 5.5 Performance Highlights

2024-11-19T00:00:00+00:00

The following Highlights are taken from the Unreal Engine 5.5 Release Notes and focus primarily on real-time game performance on PC and consoles. My personal highlights have some commentary on them and at the bottom you’ll find a raw list of changes that I found notable. There were so many changes that even at the bottom I choose not to include everything, especially if the release notes were vague on their benefit or actual improvement.

I will include a lot of the amazing new features and improvements in my Game Optimization Course!

To kick off I’m starting with some lesser known changes which include some awesome additions like batched ticks and better profiling of input latency!

Unreal Insights

Preset for “light” memory tracing. In certain scenarios it can be useful to trace detailed allocations, but without paying the cost of recording callstacks and instead rely on tags for analysis. Enable light memory tracing by starting the process with -trace=memory_light.
- Memory tracing add a lot of overhead and data, this light mode seems to be the answer for many scenarios where you are not digging too deep but want some high level info about memory.
Added Trace.RegionBegin & Trace.RegionEnd commands
- These commands allow developers to manually tag regions of insights traces with custom names.
- These are now available as Blueprint nodes too which is great to add context to profiling your game code that runs across multiple frames. As an example Garbage Collection start/end is a timing region. Level streaming spread across multiple frames is also added to insights as a timing region.
Add ‘Copy Name To Clipboard’ context menu option.
Trace Screenshot now has a Blueprint Node
Introduced _CONDITIONAL variants to TRACE_CPUPROFILER and UE_TRACE macros.

Core/Foundation

Add StaticLoadAsset, LoadAssetAsync, and FSoftObjectPath::LoadAsync functions to make it easier to asynchronously load objects from C++.
Changes the trace marker used for denoting GameThread async flushes to now clarify if a flush of all in-flight async loads is being performed or if the game thread is flushing only a subset of all loads. The “Flush All Async Loads GT” marker makes it easier to detect and fix bad behavior since, except for a few special cases, we should never wait for all loads and instead should be specifying a subset.

Gameplay

Add a new Tick Batching system for actors and components which can be enabled by setting the tick.AllowBatchedTicks cvar. When enabled, this will group together the execution of similar actor and component ticks which improves game thread performance. Also added options like ForEachNestedTick to TickFunction to better support manual tick batching (which can be faster than the new automated batching)
- This is awesome and overdue for years. This can greatly improve GT performance by better using the CPU cache by ticking all actors/components of the same class together.
- The ForEachNestedTick can further reduce individual tick overhead by letting you run through a simple loop and run your tick logic for all objects directly in the single function.

Rendering

Input latency stat computation enabled for DX11/DX12 using IDXGISwapChain::GetFrameStatistics and correlate the input reading timestamp to when the frame is handed to the display
- New command line option r.VsyncInformationInsights that will show bookmark in Unreal Insight for when the input sampling happen and when the Vsync event happen in the timeline.
- This is excellent to make input latency testing more easy.
Added support for asynchronous pipeline state caching, which is enabled by default. It can be disabled to restore the old behavior with a console variable (r.pso.EnableAsyncCacheConsolidation).
D3D12: Add mode to set stable power state on device creation instead of only during profiling This can be useful for in-editor benchmarking on PC by reducing the influence of adaptive GPU clock rate on the frame time.
AlwaysVisible: Return the latest time for components with scene proxies that are marked as always visible rather than updating the component time for each one. Saves multiple ms of CPU time in CitySample. (Not mentioned in Release Notes, but here is the Commit on GitHub)

MegaLights (Experimental)

“MegaLights is a new Experimental feature that allows artists to add hundreds of dynamic shadow-casting lights to their scenes. Artists can now light scenes playfully without constraints or impact on performance. With MegaLights, lighting artists, for the first time, can use textured area lights with soft shadows, lighting functions, media texture playback, and volumetric shadows on consoles and PC.”

This is very exciting and will explore this in detail in a future release since it’s still so early in development. At first glance it *might* be their own implementation of ReSTIR by Nvidia and relies on ray tracing (although HWRT does seem to be optional, but recommended). Check out the MegaLights documentation as this already explains a lot more than I could here right now.

More Render Parallelization

In 5.4 we already saw major improvements to render threads, 5.5 continues this trend with further improvements described as follows:

“For 5.5 there are improvements to parallel translation, which issues RHI (Render Hardware Interface) tasks to translate RHI command lists into platform command lists. The impact of this change is a dramatic performance increase of up to 2x (dropping by 7ms on some platforms), reducing the number of stalls, and offering a minor reduction in drawcalls as well as small improvements to platform dynamic res and render thread time.”

“Release 5.5 also includes improvements to asynchronous RDG (Render Dependency Graph) execute tasks which benefits both critical path rendering thread time on the order of 0.4ms, as well as allowing asynchronous execution of approximately half of slate rendering.”

This is a very welcome improvement as RenderThread and RHI Thread optimizations were historically quite difficult compared to GameThread optimizations. We don’t need to do anything to get these enabled which is even better. Previously we often saw many stalls and idle waits on these threads, I hope we will see meaningful improvements here but I have yet to try this out in production.

Lumen Improvements

As with nearly every release, we see further Lumen performance improvements. Their target appears (60hz) hardware ray tracing on consoles, which previously wasn’t viable unless you were targeting 30hz. So most games often opted for software ray tracing on consoles. Allowing HWRT is especially great for visual quality as software ray tracing is notoriously unstable visually in my experience.

Hardware Raytracing

HWRT in general has seen major improvements. Besides better translucency rendering, we can see performance improvements thanks to better caching and use of acceleration structures. All these improvements will affect a variety of rendering features including Lumen, MegaLights and even light baking.

Light Function Atlas

Light Function Atlas is an improvement over the traditional light functions which were relatively costly (See below as to why), with this baked ‘light function atlas’ should see significant rendering improvements. There is some extensive documentation on this which is worth a read if you’re intending on using light functions in your project.

“Light functions can only be applied to lights with their mobility set to Movable or Stationary and cannot be baked into lightmaps. Light functions follow the same expensive rendering passes as lights that cast dynamic shadows, because the light function contribution needs to be accumulated in screen space first. The light function’s second pass then evaluates the lighting in screen space. This is a sequential operation that happens on the GPU, and it takes more time due to resource synchronizations and cache flushes that happen.” - The Docs.

Niagara Lightweight Emitters (Beta)

Niagara Lightweight Emitters. A more limited particle (stateless) emitter which should significantly reduce overhead when running many simple emitters. These should be very interesting for simple VFX such as light flares or other ambient effects such as dust or sparks. I will absolutely cover these in my Optimization course in the future, for now they are still in Beta.

Check out the docs as they explain some of their limitations including which modules can be used.

Niagara Data Channels

Niagara Data Channels allow for events to run Niagara logic which can be great for improving impact FX. This can easily instance impact decals and spawn multiple impact sparks at multiple places using a single Niagara system. These are now production ready in 5.5 and well worth a try if you are looking to manage more short lived FX, meshes or decals.

Some immediate benefits include much cheaper to spawn/destroy these impacts and it lets us more easily instance certain meshes such as (Mesh) Decals. We’ll create far fewer short lived components to reduce cost of instantiation and eventual Garbage Collection.

If you’re looking for an example, Lyra has an example of this where they manage one Niagara System per weapon to handle impact VFX through these Data Channels. You can see how they easily instance their (Mesh) decals.

World Partition - Static Lighting (Experimental)

Seems like light baking isn’t dead yet! Lumen is still expensive and not always viable. Allowing baked lighting into world partition levels is a very interesting improvement. It requires r.AllowStaticLightingInWorldPartitionMaps=1 to be enabled in DefaultEngine.ini

Instanced Actors

“Instanced Actors is a new feature designed to reduce the overhead of having too many actors in your game world. It does so by replacing actors with Mass entities and converts on-the-fly between actors and entities (called hydration/dehydration), providing a lot more performance out of densely populated open world environments. The conversion is controlled by the Mass LOD system using distance to viewer logic, and physics traces can be used to trigger hydration as well.”

“This works best when you have many actors using the same mesh, for example rocks and trees in large environments.”

Instanced Actors feature is potentially huge for many as high Actor counts in your level has all sorts of bad side effects (including the infamous traversal stutters during level streaming - I *hope* this can help reduce those but have yet to try this in production).

In my understanding, this will (eventually) replace the LightWeightActor which never got much attention since it was introduced in 5.0.

Mutable - Customizable Characters and Meshes (Beta)

Another excellent new feature is the merging of skeletal meshes at runtime in a significantly better way than what was previously possible. Mutable generates dynamic skeletal meshes, materials and textures at runtime for creating character customization systems and dynamic content.

Mesh and texture merging to reduce draw calls.
Morph baking to reduce GPU load.
Baked texture effects such as layering and decal projection to reduce GPU load.

Nanite Mesh Texture Color Painting

We are finally getting an alternative to Vertex Painting for Nanite! It’s not directly a performance improvement, but a potentially significant workflow improvement that will affect rendering optimization possibilities.

Where previously you needed to rely on Decals to get variation back into your levels after moving to a Nanite workflow, you can now opt for painting into textures (rather than direct into vertices as was how we used to add variation through Vertex Painting) to get this traditional workflow back with Nanite!

Misc. Changes

There are so many more improvements and optimizations that I can’t all be commenting on. Some of these are still very exciting improvements such as the improvements to task system, removing the random spikes or the improved async load flushing which I’ve seen is so often an issue with projects.

Core/Foundation

Fix potential deadlock and reduce latency spikes in the task system
Changed UWorld::BlockTillLevelStreamingCompleted implementation to no longer flush all in-flight async loads globally and instead only flush outstanding streaming level async requests. In large projects this can save significant amounts of time entering PIE. Specifically,
- ULevelStreaming now provides a protected member AsyncRequestIDs to keep track of async loads issues when loading a level. During OnLoadingFinished AsyncRequestIDs will be cleared.
- If a child class of ULevelStreaming has not recorded any async loads in AsyncRequestIDs, we fallback to flushing all async loads as before since we can’t know if implementers are relying on the past behaviour of a forced flush of all async loads.
- CVar s.World.ForceFlushAllAsyncLoadsDuringLevelStreaming has been added allowing one to revert back to old flushing behavior temporarily while work to track necessary loads can be done.
Add a RunCommandlet console command. Allows for faster iteration when debugging commandlets in the editor (e.g via hot reload)
Add a thread-safe ref-counting mechanism to UObjects. Make TStrongObjectPtr more light-weight and usable on any thread by using ref-count instead of FGCObject. Add a pinning API to WeakObjectPtr so they can be converted safely to StrongObjectPtr from any thread. Make delegate broadcast thread-safe when used with UObjects by pinning during broadcast for non game-thread.
The old task graph API now uses the new task system under the hood to improve scheduling behavior.
Provide better API for AssetManager and StreamableManager to allow additional performance optimizations. Let the user pass TArray instead of the TSet into GetPrimaryAssetLoadList which let them avoid creating of unnecessary array copy when passing the list to AsyncLoading.
Replaced persistent auxilary memory with a new Persistent Linear Allocator. Some persistent UObject allocations were moved to it to save memory.
Expose GC time interval parameters inside UEngine. Allow override GC frame budget.
Replace busy wait APIs with oversubscription to fix common deadlocks and reduce CPU usage.
UnrealMathSSE cleanups enabled by having SSE4.2 min spec; also some UnrealMathNEON cleanups.
Improved performance of FMallocBinned2 and FMallocBinned3
Optimized memory footprint of FMallocBinned2 and FMallocBinned3
Fix -execcmd parsing to allow multiple instances
Added a -setby= option to DumpCVars, to filter on how they were set, like “DumpCVars r. -setby=DeviceProfile” will show all rendering (r. prefix) that were last set by a DeviceProfile
Add cvars VeryLargePageAllocator.MaxCommittedPageCountDefault and VeryLargePageAllocator.MaxCommittedPageCountSmallPool to limit the number of large pages committed for each pool. Since VLPA very rarely releases pages, this avoids the situation where VLPA permanently holds too much memory, leaving less for other large allocations or rendering etc.
Added optimized single element TArray Remove* overloads which don’t take a count.
Improves DoesPackageExistEx by enabling it to use the AssetRegistry when available, avoiding costly OS call
Compilation: Add support for the OptimizationLevel param for Clang-CL (so -Oz is used for OptimizeForSize etc). This includes optimal flags for PGO, since -Os tends to be the fastest option there (in addition to being smaller)

Gameplay

A variety of tick related improvements, including the Batched ticking mentioned earlier in this post which are fantastic additions.

Deprecated FTickableObjectBase::IsAllowedToTick because it was slow and redundant with the existing IsTickable function. The new SetTickableTickType function is a more efficient and safer way to dynamically disable tick
As part of the performance improvements to world ticking, static level collections will no longer be created by default. These were only used by the disabled client world duplication feature (but they can be created by setting s.World.CreateStaticLevelCollection)
Several performance improvements to world ticking, especially when using world partition

Rendering

Added ECVF_Scalability flag to r.Shadow.NaniteLODBias
Add the EVCF_Scalability flag to foliage.LODDistanceScale
DirectionalLight : Add a setter for AtmosphereSunDiskColorScale on the proxy so we don’t need to fully recreate the renderstate each time it changes. This avoids 20ms spikes on the render thread
Cleaned up and reduced tonemap shader permutations for faster compilation.
Added DumpMaterialInfo commandlet, which writes a CSV with properties of all matching materials to disk.
Added a project setting, r.GPUSkin.AlwaysUseDeformerForUnlimitedBoneInfluences, that allows you to enable Unlimited Bone Influences in a project without compiling extra shader permutations for GPU skinning. This saves runtime memory, disk space and shader compilation time. When the setting is enabled, any mesh LODs using Unlimited Bone Influences that don’t have a deformer assigned will use the DeformerGraph plugin’s default deformer. This ensures that UBI meshes are always rendered with a deformer, and therefore the GPU skinning permutations for UBI aren’t needed. Also added a per-LOD setting that allows users to disable mesh deformers on a specific LOD, which could be useful for controlling performance, e.g. disabling an expensive deformer on lower LODs. Some changes to functions on USkinnedMeshComponent lay the foundations for having different deformers on different LODs as well.
Cleanup r.MinScreenRadiusForCSMDepth which is not used anymore, r.Shadow.RadiusThreshold is now used for culling shadow casters.
Add basic DX12 Work Graph support. For this first pass there is no exposed RHI functionality for directly dispatching a work graph. Instead shader bundles have been extended to support a work graph based implementation. Nanite compute materials now can use work graph shader bundles on D3D12 when r.Nanite.AllowWorkGraphMaterials and r.Nanite.Bundle.Shading are both set. Both of these default to off at the moment.
Add instance culling for decal passes so that HISM decals now work instead of only the first decal instance being visible.
Implement a faster batched path for translucency lighting volume injection. Added a more accurate RectLight integration for translucency light volume (to both paths).

Shadows

[VSM] Added counters to Unreal Insights tracing. Requires both VSM stats and Insights counters to be enabled. (r.Shadow.Virtual.ShowStats 1 and trace.enable counters)
Perform PresizeSubjectPrimitiveArrays of whole scene shadows once per task instead of redundantly per packet for improved performance. Thanks to CDPR for this contribution.
Add a separate cvar to control how long unreferenced VSM lights live - r.Shadow.Virtual.Cache.MaxLightAgeSinceLastRequest - separate from the per-page ages. Keeping VSM lights around too long can cause too much bloat in the page table sizes and processing, reducing performance in various page-table-related passes (clearing, etc).
[VSM] Adapted debug viewmodes to better show local lights. Visualization modes now show a composite of all lights by default, and change to showing individual lights when one is selected in editor or by name. With r.shadow.virtual.visualize.nextlight, you can select the next light for visualization. When VSM visualization is enabled, one pass projection is now turned off, as it is incompatible with the debug output in the projection shader.
Refactor virtual shadow map invalidations to improve instance culling performance.
Changes to how WPO distance disable is handled in the virtual shadow map pass.
- See r.Shadow.Virtual.Clipmap.WPODisableDistance.LodBias and associated notes for the difference in WPO handling in shadow passes.

Lumen

Lumen received a lot of performance changes, they are pretty technical and mostly automatic. But I’ve included them here as they do mention specific passes you should see improvements for.

Added foliage specific cutoff for screen probe importance sampled specular. This can improve perf on consoles depending on the settings, scene and resolution.
Move hit velocity calculations a bit further in the shader in order to optimize number of VGPR. This improves shader occupancy % in the LumenScreenProbeHardwareRaytracing pass.
New Lumen Refection denoiser. It’s sharper, faster, has less noise and has less ghosting.
Don’t build the ray tracing light grid if it’s not used by Lumen. Saves performance when HWRT is used with Lumen.
Implement inline AHS support for Lumen on certain platforms. This speeds up AHS handling in Lumen.
Run AHS only for meshes with some sections casting shadows. Fully disabled shadows can be filtered out at an instance level, but GI and reflection passes still need to run AHS on those sections.
Overlap Radiance Cache updates (opaque and translucent). Those two passes have low GPU utilization, so it’s a pretty good optimization where translucent cache traces become almost free.
Optimize ray tracing performance by pulling out Surface Cache Alpha Masking code to a permutation, which saves some VGPRs in tracing passes.

Materials

Added the “Automatically set Material usage flags in editor default” project setting to enable/disable making new Materials automatically set usage flags.
The various recent improvements to the shader compilation pipeline means that there are a number of transformations that shader code undergoes before making it to the runtime (deduplication, deadstripping, comment stripping, removal of line directives, etc.). As such it’s not always obvious when looking at a shader in a capture (RenderDoc or similar) what it is and how it was generated. To improve this a DebugHash_ comment is now added to the top of the final shader code passed to the compiler, as well as exporting a DebugHash_.txt file alongside ShaderDebugInfo for any compiled permutation. With both of these changes it’s now possible to quickly find the dumped debug info for whatever shader you are looking at in a capture by pasting the contents of the above comment into Everything (or whatever other file search mechanism you prefer). Note that this requires both symbol and debug info export to be enabled
Updated DirectXShaderCompiler (DXC) to version release-1.8.2403.

Nanite

The Nanite streaming pool is now allocated as a reserved resources (r.Nanite.Streaming.ReservedResources) on RHIs that support it. This allows it to resize without a large memory spike from temporarily having two version of the buffer in memory.
Added the ability for the streaming pool size (r.Nanite.Streaming.StreamingPoolSize) to be adjusted at runtime, for instance in game graphics settings.
The Nanite streamer now adjusts the global quality target dynamically when the streaming pool is being overcommitted. This makes it converge to more uniform quality across the screen in those scenarios.
Disable async rasterization for Lumen Mesh Card pass and Nanite custom depth pass as it was causing large stalls while waiting for previously scheduled work in the async queue to finish.
Improved performance of Nanite tessellation patch splitter and rasterization shaders on console platforms.
Optimization: Added sorting of Nanite rasterizers (r.Nanite.RasterSort) to increase depth rejection rates for masked and PDO materials.

Niagara

Some improvements to the HWRT async traces within Niagara. Adds support for inline HW traces, where supported, through a compute shader (in some artificial tests it results in 50% performance improvement, but results will vary). Also fixes up collision group masking.
Don’t forget to check out the Niagara lightweight emitters & Data Channels!

Post Processing

Adding ECVF_Scalability to r.LUT.Size. Default remains the same = 32.
Add Medium-High TAA mode (3) Equivalent filtering quality to Medium, adds anti-ghosting Slightly slower than Medium (1), much faster than High (2)
[Engine Content] Set bloom kernel to default to 512 px. This comes up often as a opportunity for optimization.

Animation

Small performance improvements for motion matching

Landscape

Added system to invalidate VSM pages when using (non-Nanite) landscape, to hide shadow artifacts induced by the vertex morphing system of standard landscape : Relies on pre-computing max height delas from mip-to-mip for every landscape component Invalidation occurs when the evaluated max delta between the heights at the LOD value that was active when VSM was last cached is different enough from the heights at the current LOD value (for a given persistent view), based on a height threshold that is tweakable per landscape and overridable per landscape proxy Invalidation doesn’t occur when Nanite landscape is used The invalidation rate is decreased as the LOD value goes up, controlled by a screen size parameter in the landscape actor (overridable per proxy), under which no invalidation will occur. This avoids over-invalidating VSM on higher LOD values, since they tend to occupy less real estate and therefore don’t need to have perfect shadows
Added per landscape (overridable per-proxy) shadow map bias to help with this problem too
Added 3 non-shipping CVars to help tweak those 3 parameters in-game (landscape.OverrideNonNaniteVirtualShadowMapConstantDepthBiasOverride, landscape.OverrideNonNaniteVirtualShadowMapInvalidationHeightErrorThreshold, landscape.OverrideNonNaniteVirtualShadowMapInvalidationScreenSizeLimit)
- The whole invalidation system can be enabled/disabled via CVar landscape.AllowNonNaniteVirtualShadowMapInvalidation
- Added another CVar (landscape.NonNaniteVirtualShadowMapInvalidationLODAttenuationExponent) to tweak the screen-size-dependent invalidation rate curve shape
Fixed landscape.DumpLODs command : now works without parameter and can be used several times
Removed redundant calls to SupportsLandscapeEditing when ticking landscape proxies for grass. This is to avoid O(N^2) complexity when iterating on landscape proxies for ticking grass.
Made landscape collision settings overridable per-proxy

Networking

Added CSV Profiling Stat tracking for
- Average Jitter (milliseconds)
- Packet Loss Percentage (In/Out)
Added CSV profiling markers to Oodle
- when processing incoming & outgoing packets

NavInvokers

Avoid reserving local containers every frame.
Now using a map instead of an array to avoid high cost as invoker usage scale up.

Chaos

Chaos::add CVar for joint simd in scene simulation. Use “p.Chaos.Solver.Joint.UseSimd” to control if joint solver uses SimD. It might improve performance of the solver up to 15%.
Implemented Quality Level Min Lod for Chaos Cloth Assets, which can be enabled via the Engine.ini file. This matches how Quality Level Min Lod works for Static Meshes and Skeletal Meshes. The cvar p.ClothAsset.MinLodQualityLevel can be set in per platform ini files to manage MinLodQualityLevel on a per-platform basis.
Reduced Geometry Collection physics proxy runtime memory footprint by better packing data structures.

Note: There are even more release notes available that would fall under the performance or optimization umbrella but that lacked proper context and/or are too niche to be notable.

And finally, be sure to check out my Game Optimization Course for a huge list of lessons on optimization tricks while guiding you through the process of profiling and optimizing your game projects! There I will have a change to go into much greater detail on all these improvements and features in video lessons and detailed text explanations…

Setting up PSO Precaching & Bundled PSOs for Unreal Engine

2023-10-18T00:00:00+00:00

In recent years DirectX 12 games have gotten a bad rep for shader stutters. The most common issue we see discussed at launch is due to a lack of pre-compiling Pipeline State Objects. These PSOs (required by the GPU) need to be compiled on the CPU if not already cached on the local machine and will cause hitches as they may cost anywhere from a few milliseconds to several hundreds of milliseconds to compile before we may continue execution.

Update: A detailed video section on PSO gathering and project configuration is available in my Complete Game Optimization for Unreal Engine 5 Course! Feel free to use the written article below or check out the course which covers PSOs and MANY more essential topics for good game performance.

In short, a “PSO” tells the GPU exactly what state is must set itself to before executing certain operations such as drawcalls. This PSO needs to be compiled and is GPU dependent and therefore can’t be done ahead of time on certain platforms such as PC. For platforms like Xbox and PlayStation this can be done during Cooking of the project as the hardware is known ahead of time. This explains why certain game releases only suffer from hitch related issues on PC and not consoles.

“Earlier graphics APIs, such as Direct3D 11, needed to make dozens of separate calls to configure GPU parameters on the fly before issuing draw calls. More recent graphics APIs, such as Direct3D 12 (D3D12), Vulkan, and Metal, support using packages of pre-configured GPU state information, called Pipeline State Objects (PSOs), to change GPU states more quickly.

Although this greatly improves rendering efficiency, generating a new PSO on-demand can take 100 or more milliseconds, as the application has to configure every possible parameter. This makes it necessary to generate PSOs long before they are needed for them to be efficient.” - Source: Docs

PSO Caching in Unreal Engine

Since UE 5.1, the engine ships with two systems trying to solve the same problem. PSO Precaching (5.1+) and Bundled PSOs (Since early UE4). In this article I’ll explain both systems and how they currently work together.

The intent as stated by Epic Games is for the new PSO Precaching solution to replace the manual PSO gathering (or “bundled PSOs”) pipeline entirely. I did not find that coverage is sufficient as of right now (UE 5.3) for a hitchless experience even in a relatively simple test project.

Update: Coverage has since improved as we are now in UE 5.6 and issues that I had previously such as Decal Components now have added support for Precached PSOs.

This article will cover an implementation using Action Roguelike on GitHub to give you the best starting position for your own project. I’ll mostly skip what is already covered by the docs including things like the background information on PSOs and how other APIs and platforms handle this. So I’ll be focusing on Windows DirectX 12. You *really* should read the available documentation along with this article as it provides additional details on these systems.

This screenshot (Unreal Insights) shows a game running without any handling of PSOs. The result is enormous frame spikes when objects are first seen on screen as the PSO compilation steps stalls the game until the PSO is ready to be sent to the GPU. Here that PSO took 54.1ms to compile, meanwhile the game cannot continue rendering.

Unreal Insights without caching, major frame spikes (top) and compilation tasks stalling the game (bottom). Insights bookmarks display when a new PSO is discovered (and its type graphics/compute)

PSO Precaching vs. Bundled PSO Cache

The naming of the two systems can be a bit confusing as it goes by a few names in the engine code. The “PSO Precaching” is used for the new automatic runtime “just-in-time” compilation of the PSOs. This system was introduced in 5.1 and is production ready with 5.3 and later.

The original system that shipped for years with UE4 requires manual collection of PSOs by the developer and are bundled with the game executable. These bundled PSOs are then compiled when the game first launches, for example in the main menu. You can call these Bundled PSOs or Recorded PSOs. In C++ you may often see it referenced as ShaderPipelineCache in the engine source.

I’ll cover the configuration settings and my discoveries for both systems below.

Note: Epic is no longer performing the manual recording step for their PSOs and rely entirely on Precaching. That said, their game has a lot of user generated content which can’t use the bundled PSOs. They *might* still have their old recorded PSOs included with the installation (unconfirmed).

Fornite’s load screen is said to be about 15 seconds longer on first load due to Precaching. Keep in mind that otherwise you would have to compile the bundled PSOs in your main menu. So the moment of compilation has simply moved with Precaching. Precaching also only compiles the PSOs used by the level being loaded where bundled PSOs just compile the entire game unless you apply Masking.

How does PSO Precaching work?

PSO Precaching attempts to compile the PSOs ahead of time during the PostLoad() of the object that supports it. This works well for loading screens where the objects won’t be rendered yet. For in-game spawning and streaming this may be too late and compilation may not be finished when the object should be rendered on screen. There is a new feature for exactly this issue which can skip the draw call until the PSO is ready.

// Skips the draw command which is at a different stage from the Proxy Creation skip below. This may cause artifacts as part of the object could be rendered if split among different commands.
r.SkipDrawOnPSOPrecaching=0 (keep this off, it's no longer recommended by Epic)

// Primary CVAR to enable for precaching (its on component level for "best results")
r.PSOPrecache.ProxyCreationWhenPSOReady=1 (on by default)

There are two modes available for late PSOs. The first is to skip rendering the mesh entirely, the second renders the mesh with the DefaultMaterial instead. It’s up to the developer to decide which mode has the least visual popping.

The skip draw is my current best understanding of the system and commands (This article will be updated as I uncover this feature). Here is a quote I could find that may help clarify them.

“There is an option to skip the draw at command list building as well (r.SkipDrawOnPSOPrecaching) but it still needs to know if the PSO is still compiling or missing. The problem is that if the low level skips the draw that this could lead to visual artifacts (for example certain passes for a geometry have their PSOs compiles while other passes don’t). That’s why the skip proxy creation is pushed all the way to component level because there we know the PSOs are available for all the passes it needs to correctly render the object.” - Epic

Make sure you read the documentation as this does a good job of covering a lot of concepts new with PSO Precaching.

Optional: Get a baseline Insights trace

It’s good practice to have a baseline when doing performance testing or other forms of optimization.

Without any changes applied, run the packaged game with -trace=default -clearPSODriverCache and the Unreal Insights session browser open (InstallFolder/Engine/Binaries/Win64/UnrealInsights.exe).

Clearing the PSOs with the specified command is essential because otherwise, the game may load compiled PSOs from a previous session stored by the GPU driver.

If your stalls are severe enough then you may not need Insights for rough comparison testing…I still recommend it either way.

Enabling PSO Precache

PSO Precaching is very easy to use, you simply add the following to your DefaultEngine.ini

[/Script/Engine.RendererSettings]
r.PSOPrecaching=1

No further preparation is required. When the packaged game loads a level for the first time, you’ll notice an increase in load time where it will compile the known PSOs. A second time loading the same level should not see this same increase in load times.

Stat PSOPrecache

You can get some in-viewport stats if you have validation enabled. You have two CVARs for this:

r.PSOPrecache.Validation=2
r.PSOPrecache.Validation.TrackMinimalPSOs=1

Display the following stats using the stat psoprecache console command.

This stat command is only available in builds without WITH_EDITOR compile flag, such as packaged builds. In-editor this command is not available.

Precaching in Unreal Insights

You can best see the compilation steps in the game’s load screen using Unreal Insights. It adds a large number of tasks on worker threads during map loading and may increase the total time it takes when its first loaded by the player. These compiled PSOs do get stored by the GPU drivers meaning the next time you load this level, you won’t suffer the same penalty.

To get proper stats here you do need to enable PSO Validation mentioned earlier. Don’t forget about -clearPSODriverCache to have a clean cache every run.

“PSOPrecache: Untracked” in stats and Insights are most likely global shaders and not missed material shaders. You should be able to catch these using Bundled PSOs.

Combining with Bundled PSOs

While the new system is a great improvement for games running on DX12, it will not catch everything just yet (tested in 5.4, more recent versions continue to improve on the system). If you have this enabled and still notice stutters and have confirmed this is due to PSOs (using Unreal Insights - simply look for the PSO bookmarks) then you can still manually gather the PSOs to fix these particular stutters.

Combining both systems is what I am currently doing in the Action Roguelike sample project for the best coverage. Without bundled PSOs I could not get a hitch free experience as of 5.3 since even basic components like DecalComponent are not supported at this time. In UE 5.6 (and possibly earlier versions) they have included additional coverage including UDecalComponent. You can find out by looking in code for things such as UDecalComponent::PrecachePSOs().

How to setup Bundled PSOs?

For bundled PSOs the official documentation does a pretty decent job to get you started. I won’t be repeating many of the things they already cover there and instead just elaborate on the CVARs I discovered, suggestions for capturing PSOs manually and my findings when trying out this system. I am using the same initial set up as the official docs, and modified from there. I’m keeping it brief as I don’t want too much overlap.

The configuration steps assume Precaching is Enabled. We’ll run both systems together.

However, if you just want to confirm Bundled PSOs working properly, it may be easier to first follow along with Precaching Disabled as it ensures consistent stutters which is easier to confirm as “fixed” after recording PSOs.

Setting up the CVARs

Add the following to DefaultEngine.ini

[DevOptions.Shaders]
NeedsShaderStableKeys=true

[/Script/Engine.RendererSettings]
r.ShaderPipelineCache.Enabled=1
// essentially a light mode to only capture non precachable PSOs (3 CMDs below CAN be skipped for now if you want to run purely on bundled PSOs!)
r.ShaderPipelineCache.ExcludePrecachePSO=1
// Required for ExcludePrecachePSO to know which PSOs can be skipped during recording (-logPSO)
r.PSOPrecache.Validation=2
// Above two only relevant if we want precaching enabled
r.PSOPrecaching=1
r.PSOPrecache.ProxyCreationWhenPSOReady=1

And DefaultGame.ini (may already be set)

[/Script/UnrealEd.ProjectPackagingSettings]
bShareMaterialShaderCode=True
bSharedMaterialNativeLibraries=True

Cook content to generate ShaderStableInfo

After enabling the system, you’ll need to cook the game at least once to generate the .shk files. They will be added in ActionRoguelike\Saved\Cooked\Windows\ActionRoguelike\Metadata\PipelineCaches

Copy the following two _SM6.shk files (if supporting SM6, we do in this example) to a folder somewhere on your PC, in the example: ActionRoguelike/CollectionPSOs/

ShaderStableInfo-ActionRoguelike-PCD3D_SM6.shk
ShaderStableInfo-Global-PCD3D_SM6.shk

(copying both _SM5 and _SM6 .shk files did crash for me when converting the recorded PSOs later in this process. Luckily we’re just interested in setting up SM6 for this example)

You may need to copy the shader stable files again if you make adjustments to the project’s enabled shader permutations. Keep this in mind if you recording conversion step fails at some point in development.

Record (Some) PSOs

Now we will record some PSOs to file which we can later inject back into our next build. For the example, don’t worry about covering every possible material or shader. This process is cumulative and multiple recordings can be merged by the next step in this process.

To verify this process is working for you, remember what you did when “recording” so that you can repeat it at the end and confirm that section no longer stutters.

Launch the packaged game with -logPSO as a launch parameter. simplest way is to make a shortcut and add this as in the Target field. I run all my executables with -clearPSODriverCache so that I can consistently see stutters and not accidentally use the GPU’s driver cache which may contain compiled PSOs from an earlier run.

Quit the game and find the .rec.upipelinecache file(s) in Build/Windows/PipelineCaches/ each run with -logPSO will generate another file that can be copied into our ActionRoguelike/CollectedPSOs folder. You don’t need to delete old recordings unless you want to start from scratch as they get merged together in the next step using the ShaderPipelineCacheTools commandlet.

Convert Recorded PSOs

The final step in recorded PSOs is to convert the individual recordings into a single .spc file that will be used by the cooker whenever the game is packaged again.

You can use the following command template to convert the recorded PSOs: (View the latest .bat on GitHub)

E:\Epic\UE_5.3\Engine\Binaries\Win64\UnrealEditor-Cmd.exe -run=ShaderPipelineCacheTools expand E:\GitHub\ActionRoguelike\CollectedPSOs\*.rec.upipelinecache E:\GitHub\ActionRoguelike\CollectedPSOs\*.shk E:\GitHub\ActionRoguelike\CollectedPSOs\PSO_ActionRoguelike_PCD3D_SM6.spc

This runs the commandline version of the editor, executes the ShaderPipelineCacheTools commandlet with expand command and requires the .shk (shader stable files) copied from a previous step along with all the recordings. Running this commandlet generates PSO_ActionRoguelike_PCD3D_SM6.spc (see the docs on naming this file)

Copy the generated PSO_ActionRoguelike_PCD3D_SM6.spc file to /Build/Windows/PipelineCaches/ so it can be used by the cooker the next time the game is packaged.

If you followed along with Precaching enabled, we run this system essentially in a “light” mode where it only captures PSOs not handled by precaching using the following CVARs:

r.ShaderPipelineCache.ExcludePrecachePSO=1
// Validation required to know which PSOs can be skipped during -logPSO
r.PSOPrecache.Validation=2

Testing the PSOs

Package the game again, the generated .spc will be included in the build.

Can’t stress this enough: When testing PSOs, ALWAYS run with -clearPSODriverCache as a launch parameter or you’ll believe to have fixed the issue while it simply grabs cached files from the local GPU driver cache.

To confirm caching has worked run the packaged game with Insights using -trace=default -clearPSODriverCache or “stat unitgraph” to visualize the stutters in-viewport. Within Insights you can check the Bookmarks or Log and see if there is any new PSOs encountered.

If you load the same level and perform the same gameplay actions as the baseline before making any changes, there should no longer be any PSO related stutters.

Keep in mind that the bundled PSOs need to compile once the game first boots. This starts pretty early in the process, but if you load directly into a level from launch it may not be ready by the time the load screen is complete. Best is to boot into the main menu and confirm the log that compilation started and finished. You can verify this is happening in the log:

LogRHI: FShaderPipelineCache::BeginNextPrecompileCacheTask() - ActionRoguelike begining compile.
LogRHI: Display: FShaderPipelineCache starting pipeline cache 'ActionRoguelike' and enqueued 321 tasks for precompile. (cache contains 321, 321 eligible, 0 had missing shaders. 0 already compiled). BatchSize 50 and BatchTime 16.000000.
...
LogRHI: Warning: FShaderPipelineCache ActionRoguelike completed 321 tasks in 0.06s (0.91s wall time since initial open).

Game Example

The full example is available as the Action Roguelike Project on GitHub. I’ll list the files included for reference below.

DefaultEngine.ini

[DevOptions.Shaders]
NeedsShaderStableKeys=true

[/Script/Engine.RendererSettings]
r.PSOPrecaching=1
; keep this active for validation with 'stat psocache', Insights AND required for ExcludePrecachePSO cvar
r.PSOPrecache.Validation=2
; additional detail in logging for "stat psocache"
r.PSOPrecache.Validation.TrackMinimalPSOs=1
; settings below for bundled PSO steps to combine with PSO Precache
r.ShaderPipelineCache.ExcludePrecachePSO=1
r.ShaderPipelineCache.Enabled=1
; start up background compilation mode so we can run "hitchless" in a main menu (optional)
r.ShaderPipelineCache.StartupMode=2

DefaultGame.ini

[/Script/UnrealEd.ProjectPackagingSettings]
bShareMaterialShaderCode=True
bSharedMaterialNativeLibraries=True

The /CollectedPSOs/ folder in the project root contains the Cmd_ConvertPSOs.bat file to convert the collected .rec.upipelinecache files (can be multiple) and needs the .shk files copied from Saved\Cooked\Windows\ActionRoguelike\Metadata\PipelineCaches (requires at least one cook after enabling PSO CVARs)

The folder will eventually contain many .rec.upipelinecache files as they can be aggregated together by the commandlet. This makes capturing much easier as you need don’t run the full game every capture.

The generated PSO_ActionRoguelike_PCD3D_SM6.spc file must be copied to Build/Windows/PipelineCaches every time it’s generated by the commandlet.

You can of course modify your commands to properly automate this to avoid the mistakes of forgetting to place the updated files in the correct folders. I stuck with the exact workflow as suggested by Epic’s documentation for this example.

Automation Suggestions

Handling Bundled PSOs is a lot of work compared to the new PSO Precaching. Therefore we can only hope that it will eventually be replaced entirely saving everyone a ton of work. Until then I’d like to suggest some ideas for streamlining this process as implementing this falls out of the scope of this article.

Have QA or playtesters with the game with -logPSO, this would ideally automatically upload the generated file to a server to avoid manual work. Make sure they run on different scalability settings too as these will create different PSOs.
Create a simple spline actor in every level that can do a flythrough to visit all locations. This might not cover everything so keep cinematics and spawnables in mind. Perhaps these cinematics can be triggered as part of the automation after the fly through has completed.
Have a custom map for PSO gathering. This can contain all your spawnables from gameplay such as items, weapons.

Don’t forget to run the game on the low/medium/high/epic Scalability settings for full coverage.

Additional Notes

The build configuration does not affect the generated PSOs. You can use Debug/Development/Shipping build configurations for the cooked game builds to gather the PSOs in your development pipeline.

// Use “Fast” for loading screens, “Background” for UI and interactive moments r.ShaderPipelineCache.SetBatchMode pause/fast/background/precompile

You can expose the number of remaining precompiles from Bundled PSOs to display some number or percentage in your main menu:

FShaderPipelineCache::NumPrecompilesRemaining()

There are many more CVARs available in the different PSO related code files:

RenderCore/ShaderPipelineCache.cpp
Engine/PSOPrecache.cpp

Niagara has its own logic and control CVARs for PSOs such as fx.Niagara.Emitter.ComputePSOPrecacheMode but I have not worked with any of those settings at this time.

For UE4.27 Developers: Niagara lacks some support for proper PSO coverage. I’ve been told some users had to backport several commits to improve this PSO handling for UE4.27. For your info and further investigation here are those commits (must be logged in to view):

Some further info for those with UDN access.

Important Note for Nvidia GPUs

As of 2 September 2025 - According to Epic, Nvidia driver update will change the PSO file extension which will break the -clearPSODriverCache command which is used for clearing your local cache to properly test PSO coverage. This is fixed in UE 5.6 but any versions prior to this will have this issue.

A possible workaround I could think of is to disable your Shader Cache in Nvidia Control Panel directly or apply Epic’s fix on your engine build older than 5.6. The patch is available here. (requires Epic connected GitHub account to view)

Closing

This article aims to fill some of the knowledge gaps left by the docs and release notes. As Precaching was announced it took me longer than I care to admit before I had it fully working. Partially as its claims are bigger than what it delivers, as the simple project can’t seem to reach full coverage with Precaching and *needs* the old system for a solid 100% experience. I’m confident this will be addressed and improved in future versions, until then combining both old and new seems like the way to go.

All feedback on this post is most welcome! I’m sure I’ve still missed something or might confuse people with certain steps. I want this article to save the time I had to spend figuring this all out. (Thankfully 5.3 added more info to help get us started!)

Follow me on Twitter and subscribe below for new content!

References

Unreal Engine 5 C++ Guide: Pointers, Macros, Delegates & More

2023-02-14T00:00:00+00:00

Getting started with Unreal Engine C++ can be a bit of a struggle. The resources online have no clear path to follow or fail to explain the Unrealisms you’ll encounter. In this article, I’ll attempt to give you an overview of many unique aspects of Unreal’s C++ (TObjectPtr, Delegates, etc.) and briefly go over some of the native C++ features (pointers, macros, interfaces) and how they are used in the context of Unreal Engine. It’s a compilation of the many different concepts that you will face when working in C++ and Unreal Engine on a daily basis.

Throughout the article, I will be using code snippets from “Project Orion” a Co-op Action Roguelike Sample Game. You can browse the source code on GitHub.

Note: This guide should help you understand the specifics of C++ within Unreal Engine. To have a starting point and reference guide while diving into the hands-on tutorials that demonstrate the practical use of C++ for your game. This guide is extensive, don’t forget to bookmark it!

C++ vs. Blueprints

Before we begin, a quick word on C++ vs. Blueprint. It’s the most common discussion in the community. I love C++ and Blueprint and heavily use both. Building a solid foundation in C++ (your framework) and creating small game-specific ‘scripts’ on top using Blueprint is an extremely powerful combination.

While Blueprint in Unreal Engine is a powerful scripting tool for anyone looking to build games, learning C++ unlocks the full potential of the engine. Not every feature is exposed to Blueprint, for certain things you still need C++. Certain game features may just be easier to build and maintain in C++ in the first place. Not to mention the potential performance gain of using code over Blueprint for the core systems of your game.

“In the early days, I went deep into C++ and tried to do pretty much everything with it, disregarding the power of Blueprint. In hindsight, this made my code more rigid than it needed to be and removed some flexibility for others to make adjustments without C++ knowledge. I later focused more on a healthy balance to great effect.”

Building the foundational systems (ability systems, inventories, world interaction, etc.) in C++ and expanding these systems in Blueprint to tie it all together to actual gameplay. This is something we dive into during my C+ course as well, where we build the game framework and ability system to enable small but powerful Blueprints to be created on top to define specific abilities, items, interactions, etc.

Alex Forsythe has a great video explaining how C++ and Blueprint fit together and why you should use both instead of evangelizing one and dismissing the other.

C++ Syntax & Symbols

While looking at C++ tutorials, you may be wondering about a few common symbols. I will explain their meaning and use cases without going too deep into their technical details. I’ll explain how they are most commonly used within Unreal Engine gameplay programming, not C++ programming in general.

Asterisk ‘*’ (Pointers)

Commonly known as “pointers”, they may sound scarier than they actually are within Unreal Engine, as most memory management is being taken care of while we’re dealing with gameplay programming. Most commonly used to access objects like Actors in your level and references to assets in your content folders such as sound effects or particle systems.

Pointers to Objects

The first way you’ll be using pointers is to access and track instances of your objects. In order to access your player, you’ll keep a pointer to the player class. For example, AMyCharacter* MyPlayer;

// Get pointer to player controller, points to somewhere in memory containing all data about the object.
APlayerController* PC = GetWorld()->GetPlayerController();

After running this code, the “PC” variable is now pointing to the same place in memory as the player controller we retrieved from World. We didn’t duplicate anything or create anything new, we just looked up where to find the object we need, and can now use it to do stuff for us such as calling functions on it or accessing its variables.

// Example function that tries to get the Actor underneath the player crosshair if there is any
AActor* FocusedActor = GetFocusedInteractionActor();
if (FocusedActor != nullptr)
{
  FocusedActor->Interact();
}
// alternative shorthand to check if pointer is valid is simply
if (FocusedActor)
{
  FocusedActor->Interact();
}

Defensive Programming

It’s important to check if pointers are not “null” (also written as nullptr in code, meaning not pointing to anything in memory) before attempting to call functions or change its variables, or the engine will crash when executing that piece of code. So you will use the above if-statement often.

Perhaps even more important than knowing when to check for nullptr, is when NOT to include nullptr checks.

You should generally only check for nullptr if it’s likely and expected that a pointer is null and continue execution of the game regardless. In the above code example, FocusedActor is going to be nullptr any time there is no interactable Actor under the player’s crosshair.

Now imagine in the example below we return a nullptr from GetPlayerController() and (quietly) skip the if-statement where we would otherwise add an item to inventory. You will scratch your head while playing wondering why you did not receive this item. Having no player controller during gameplay is unexpected and not a valid state of the game, we should not allow to (silently) continue. We either crash the game entirely or at the very least include an Assert to be immediately informed about this corrupt/broken state of the code.

APlayerController* PC = GetWorld()->GetPlayerController();
if (PC)
{
    PC->AddToInventory(NewItem);
}

For more info on this concept, I recommend Ari Arnbjörnsson’s talk (at 22:48).

When creating components to be used in your Actor classes we use similar syntax. In the header file, we define a pointer to a component, this will be a nullptr until we assign it an instance of the component. Here is an example from the header of RogueCharacter.h where we define a CameraComponent. (See ObjectPtr further down in this article which has replaced raw pointers in headers for Unreal Engine 5)

UPROPERTY(VisibleAnywhere)
UCameraComponent* CameraComp;

Now in the RogueCharacter.cpp constructor (called during spawning/instantiation of the Character class), we create an instance of the CameraComponent.

// This function is only used within constructors to create new instances of our components. Outside of the constructor we use NewObject();
CameraComp = CreateDefaultSubobject<UCameraComponent>("CameraComp");
// We can now safely call functions on the component
CameraComp->SetupAttachment(SpringArmComp);

We have now created and assigned an instance to the CameraComp variable.

If you want to create a new object outside the constructor, you instead use NewObject(), and for creating and spawning Actors use GetWorld()->SpawnActor() where T is the class you want to spawn such as ARogueCharacter.

TObjectPtr

In Unreal Engine 5 a new concept was introduced called TObjectPtr to replace raw pointers (eg. UCameraComponent*) in header files with UProperties. This benefits the new systems such as virtualized assets among other things which is why it’s the new standard moving forward. The example above will now look as follows.

UPROPERTY(VisibleAnywhere)
TObjectPtr<UCameraComponent> CameraComp;

These benefits are for the editor only and in shipped builds it will function identically to raw pointers. You may continue to use raw pointers, but it’s advised by Epic to move over to using TObjectPtr whenever possible.

TObjectPtr is only for the member properties in the headers, your C++ code in .cpp files continues to use raw pointers as there is no benefit to using TObjectPtr in functions and short-lived scope.

Pointers to Assets

The other common way to use pointers is to reference assets. These don’t represent instances in your world/level, but instead point to loaded content in memory such as textures, sound effects, meshes, etc. (it’s still pointing to an object, which in this case is the class representing a piece of content or an “in-memory representation of an asset on disk”).

Much like the previous example of the Camera Component, in Unreal Engine 5 you will use TObjectPtr instead of UNiagaraSystem* (raw pointer) to reference assets. Raw pointers continue to work and shipped builds will effectively use raw pointers again automatically.

We can take a projectile attack ability as an example that references a particle system. The header defines the NiagaraSystem pointer:

/* Particle System played during attack animation */
UPROPERTY(EditAnywhere, Category = "Attack")
TObjectPtr<UNiagaraSystem> CastingEffect;
// Can point to an asset in our content folder, will be assigned something via the editor, not in the constructor as we did with components

Note that this pointer is going to be empty (nullptr) unless we assigned it to a specific Niagara particle system via the Unreal Editor. That’s why we add UPROPERTY(EditAnywhere) to expose the variable to be assigned with an asset.

Now in the class file of the projectile attack (line 25), we can use this asset pointer to spawn the specified particle system:

UNiagaraFunctionLibrary::SpawnSystemAttached(CastingEffect, Character->GetMesh(), HandSocketName, FVector::ZeroVector, FRotator::ZeroRotator, EAttachLocation::SnapToTarget, true);

Note: In this example, we didn’t check whether CastingEffect is a nullptr before attempting to use it, the SpawnEmitterAttached function already does that and won’t crash if it wasn’t assigned a valid particle system.

Period ‘.’ and Arrow operator ‘->’ (Accessing Variables/Functions)

Used to access Variables or call Functions of objects. You can type in the period ‘.’ and it automatically converts to ‘->’ in source editors like Visual Studio when used on a pointer. While they are similar in use, the ‘.’ is used on Value-types such as structs (like FVector, FRotator, and FHitResult) and ‘->’ is generally used on classes that you access using Pointers, like Actor, GameMode, NiagaraSystem, etc.

Examples:

// pointer to Actor class called AMyCar ('A' prefix explained later)
AMyCar* MyCar = SpawnActor<AMyCar>(...); 
// Calling function on class instance (pointer)
MyCar->StartEngine(); 
// Getting variable from class instance (pointer)
float Variable = MyCar->EngineTorque; 

// struct containing line trace info
FHitResult HitResult;
// FHitResult is a struct, meaning we use it as a value type and not a class instance.
FVector HitLocation = HitResult.ImpactLocation;

Note: You can use pointers with value types like struct, float, etc. You often don’t use pointers on these types in game code, hence why I used this as the differentiator.

Double Colon ‘::’

Used to access ‘static functions’ (and variables) on classes. A good example is UGameplayStatics, which only consists of static functions, eg. to spawn particles and sounds. Generally, you’ll have very few static variables, so its main use is for easy-to-access functions. Static functions cannot be called on a class instance and only on the class type itself (see below).

Example of calling a static function on a class:

UGameplayStatics::PlaySoundAtLocation(this, SoundOnStagger, GetActorLocation());

Since these functions are static, they don’t belong to a specific ‘UWorld’. UWorld is generally the level/world you play in, but within the editor, it could be many other things (the static mesh editor has its own UWorld for example). Many things need UWorld, and so you will often see the first parameter of static functions look like this:

static void PlaySoundAtLocation(const UObject* WorldContextObject, USoundBase* Sound, FVector Location, ...)

UObject* WorldContextObject can be anything that lives in the relevant world, such as the character that calls this function. And so most of the time you can pass ‘this’ keyword as the first parameter. The const keyword in front of the parameter means you cannot make changes to that WorldContextObject within the context of the function.

You will also see a double colon when declaring the body of a function itself (regardless of it being static or not)

void ARogueAICharacter::Stagger(UAnimMontage* AnimMontage, FName SectionName /* = NAME_None*/)
{
  // ... code in the function (this is in the .cpp file)
}

Ampersand ‘&’ (References & Address operator)

Also known as the reference symbol and address operator. I find that I don’t use this as often as the others within gameplay code specifically, but important to know how to use it nonetheless as you will need it to pass around functions when setting timers or binding input.

Pass by Reference

A common concept is to ‘pass by reference’ a value type like a struct, or a big Array filled with thousands of objects. If you were to pass these variables into a function, without the reference symbol, two things happen:

The code creates a copy of the parameter value, in the case of a big array this can be costly and unnecessary.
More importantly, because a copy is created, you can’t simply change that variable and have it change in the ‘original’ variable too, you basically cloned it and left the original variable unchanged. If you want to change the original variable inside the function, you need to pass it in as a reference (this is specific to value types like float, bool, structs such as FVector, etc.) Let me give you an example.

void ChangeTime(float TimeToUpdate)
{
    // add 1 second to the total time
    TimeToUpdate += 1.0f;
}

Now calling this function as seen in the example below will print out 0.0f at the end since the original TimeVar was never actually changed.

float TimeVar = 0.0f;

ChangeTime(TimeVar);

print(TimeVar); // This would print out: 0.0f  - because we cloned the original variable, and didn't pass in the original into the function. So any change made to that value inside the function is lost.

Now we change the function to:

void ChangeTime(float& TimeToUpdate)
{
    // add 1 second to the total time
    TimeToUpdate += 1.0f;
}

Now if we use the same code as before, we get a different result: The printed value would now be 1.0f.

float TimeVar = 0.0f;

ChangeTime(TimeVar);

print(TimeVar); // This would print out: 1.0f - because we passed in the original value by reference, let the function add 1.0f and so it updated TimeVar instead of a copy.

Address Operator

Another important use is the address operator, which even lets us pass functions as parameters into other functions. This is very useful for binding user input and setting timers to trigger specific functions.

The BindAxis() function in the example below needs to know which function to call when the mapped input is triggered. We pass in the function and use the address operator (&).

// Called to bind functionality to input
void ARogueCharacter::SetupPlayerInputComponent(UInputComponent* PlayerInputComponent)
{
  Super::SetupPlayerInputComponent(PlayerInputComponent);

  PlayerInputComponent->BindAxis("MoveForward", this, &ARogueCharacter::MoveForward);
  PlayerInputComponent->BindAxis("MoveRight", this, &ARogueCharacter::MoveRight);
}

Another common use case is to pass a function into timers. The third parameter is again the function we pass in to be called when the timer elapses.

// Activate the fuze to explode the bomb after several seconds
GetWorldTimerManager().SetTimer(FuzeTimerHandle, this, &ARogueBombActor::Explode, MaxFuzeTime, false);

Public, Protected, Private

These keywords can mark variables and functions in the header file to give or limit ‘access rights’ for other classes.

private: can only be accessed inside that class and not other classes or even derived classes.
protected: it cannot be accessed from other classes but can be accessed in the derived class.
public: other classes have open access to the variable or function.

Generally, you only want to expose what can be safely called/changed from the outside (other classes). You don’t want to make your variables public if they should trigger an event whenever they are changed. Instead, you mark the variable protected or even private and create a public function instead which sets the variable and calls the desired event.

private:
  int32 MyInt;

public:
  void SetMyInt(int32 NewInt);

Forward Declaring Classes

Forward declaring C++ classes is done in header files and is done instead of including the full files via #include. The purpose of forward declaring is to reduce compile times and dependencies between classes compared to including the .h file.

Let’s say we wish to use UNiagaraSystem class in another header named MyCharacter.h. The header file (and compiler) doesn’t need to know everything about UNiagaraSystem, just that the word is used as a class.

#include "CoreMinimal.h"
//#include "NiagaraSystem.h" // << We don't need to include the entire file

class UNiagaraSystem; // << We can instead just 'forward declare' the type.

UCLASS()
class ACTIONROGUELIKE_API ARogueCharacter : public ACharacter
{
  GENERATED_BODY()

  TObjectPtr<UNiagaraSystem> CastingEffect;
// ...

The class keyword provides the minimum the compiler requires to understand that word is in fact a class. If we included the .h file for the class instead this could negatively impact our compile times. Any changes to the included header (eg. including your MyCharacter.h elsewhere in your code) will cause the classes which include said header to re-compile too.

Here is the character class example that forward declares all the Components used in the header instead of including their .h files.

Forward Declaration is mentioned in Epic’s Coding Standards as well. “If you can use forward declarations instead of including a header, do so.”

Casting (Cast)

Casting to specific classes is something you’ll use all the time. Casting pointers in Unreal Engine is a bit different from ‘raw C++’ in that it’s safe to cast to types that might not be valid, your code won’t crash and instead just returns a nullptr (null pointer).

As an example, you might want to Cast your APawn* to your own character class (eg. ARogueCharacter) as casting is required to access the variables and functions declared in that specific class.

APawn* MyPawn = GetPawn();
ARogueCharacter* MyCharacter = Cast<ARogueCharacter>(MyPawn);
if (MyCharacter) // verify the cast succeeded before calling functions
{
  // Respawn() is defined in ARogueCharacter, and doesn't exist in the base class APawn. Therefore we must first Cast to the appropriate class.
  MyCharacter->Respawn(); 
}

It’s not always preferable to cast to specific classes, especially in Blueprint as this can have a negative impact on how much data needs to be loaded into memory. Any time you add a Cast to a certain Blueprint class on your EventGraph that object will be loaded into memory immediately (not when the Cast-node is hit at runtime, but as soon as the Blueprint itself gets loaded/created), causing a cascade of loaded objects. Especially when Blueprints reference a lot of assets (meshes, particles, textures) this has a large impact on your project’s (load/memory) performance.

Blueprint Example: BlueprintA has a cast-to node in its EventGraph that casts to BlueprintB. Now as soon as BlueprintA is used/loaded in-game, BlueprintB is loaded at the same time. They will now both remain in memory even if you don’t actually have any instances of BlueprintB in your Level.

This often becomes a problem when developers put all their code in the Character Blueprint. Everything you Cast to on its EventGraph (or functions) will be loaded into memory including all their textures, models, and particles.

Since all C++ classes will be loaded into memory at startup regardless, the main reason to cast to base classes is compilation time. It will avoid having to recompile classes that reference (#include) your class headers whenever you make a change. This can have a cascading effect of recompiling classes that depend on each other.

C++ Example: You only cast to ARogueCharacter if your function or variable required is first declared in that class. If you instead need something already declared in APawn, you should simply cast to APawn instead.

One way to reduce class dependencies is through interfaces…so that’s what we will talk about next.

Interfaces

Interfaces are a great way to add functions to multiple classes without specifying any actual functionality yet (implementation). Your player might be able to interact with a large variety of different Actors in the level, each with a different reaction/implementation. A lever might animate, a door could open or a key gets picked up and added to the inventory.

Interfaces in Unreal are a bit different from normal programming interfaces in that in Unreal Engine you are not required to implement the function, it’s optional.

An alternative to interfaces is to create a single base class (as mentioned earlier) that contains a Interact() function that child classes can override to implement their own behavior. Having a single base class is not always ideal or even possible depending on your class hierarchy, and that’s where interfaces might solve your problem.

Interfaces are a little odd at first in C++ as they require two classes with different prefix letters. They are both used for different reasons but first, let’s look at the header.

// This class does not need to be modified.
UINTERFACE(MinimalAPI)
class USGameplayInterface : public UInterface
{
  GENERATED_BODY()
};

/**
 * 
 */
class ACTIONROGUELIKE_API IRogueGameplayInterface
{
  GENERATED_BODY()

  // Add interface functions to this class. This is the class that will be inherited to implement this interface.
public:

  UFUNCTION(BlueprintCallable, BlueprintNativeEvent)
  void Interact(APawn* InstigatorPawn);
};

With the interface class defined you can ‘inherit’ from it in other C++ classes and implement actual behavior. For this, you use the “I” prefixed class name. Next to public AActor we add , public IRogueGameplayInterface to specify we want to inherit the functions from the interface.

UCLASS()
class ACTIONROGUELIKE_API ARogueItemChest : public AActor, public IRogueGameplayInterface // 'inherit' from interface
{
  GENERATED_BODY()

  // declared as _Implementation since we defined the function in interface as BlueprintNativeEvent
  void Interact_Implementation(APawn* InstigatorPawn); 
}

BlueprintNativeEvent allows C++ to provide a base implementation while Blueprint derived classes can choose to override or extend this function. In C++ the function implementation will have an \_Implementation suffix added. This is from code generated by Unreal.

In order to check whether a specific class implements (inherits from) the interface you can use Implements(). For this, you use the “U” prefixed class name.

if (MyActor->Implements<URogueGameplayInterface>())
{
}

Calling interface functions is again unconventional. The signature looks as follows: IMyInterface::Execute_YourFunctionName(ObjectToCallOn, Params); This is another case where you use the “I” prefixed class.

IRogueGameplayInterface::Execute_Interact(MyActor, MyParam1);

Important: There are other ways to call this function, such as casting your Actor to the interface type and calling the function directly. However, this fails entirely when interfaces are added/inherited to your class in Blueprint instead of in C++, so it’s recommended to just avoid that altogether.

However, if you want to share functionality between Actors but don’t want to use a base class then you could use an ActorComponent.

Steve Streeting has more details on using Interfaces which I recommend checking out. There is a code example in the Action Roguelike project as well using RogueGameplayInterface used by InteractionComponent to call Interact() on any Actor implementing the interface.

Delegates

Delegates (also known as Events) allow code to call one or multiple bound functions when triggered. Sometimes you’ll see this referred to as Callbacks. For example, It can be incredibly helpful to bind/listen to a delegate and be notified when a value (such as character health) changes. This can be a lot more efficient than polling whether something changed every frame using Tick().

There are several types of these delegates/events. I’ll explain the most commonly used ones for game code using practical examples rather than low-level language details. I’m also not covering all the different ways of binding (only focusing on the more practical ways instead) or niche use cases, you can find more details on the official documentation for those.

Declaring and Using Delegates

You start by declaring the delegate with a MACRO. There are variants available to allow passing in parameters, these have the following suffix. _OneParam, _TwoParams, _ThreeParams, etc. You define these in the header file, ideally, above the class where you want to call them.

// These macros will sit at the top of your header files.
DECLARE_DYNAMIC_MULTICAST_DELEGATE()
DECLARE_DYNAMIC_MULTICAST_DELEGATE_TwoParams()

We’ll start by showing the process of declaring and using delegates in detail with a commonly used type, and then explain the other types more briefly as they share the same concepts.

Multicast Dynamic

One of the most used types of delegate in your game code as they can be exposed to Blueprint to bind and receive callbacks.

Note: Dynamic Multicast Delegates are also known as Event Dispatchers in Blueprint.

The macros take at least one parameter, which defines their name. eg. FOnAttributeChanged could be a name we use as our Delegate to execute whenever an attribute such as Health changes.

DECLARE_DYNAMIC_MULTICAST_DELEGATE(<typename>)
DECLARE_DYNAMIC_MULTICAST_DELEGATE_TwoParams(<typename>, <paramtype1>, <paramvarname1>, <paramtype2>,<paramvarname2>)

Here is one example of a delegate with four parameters to notify code about a change to an attribute. The type and variable names are split by commas, unlike normal functions.

DECLARE_DYNAMIC_MULTICAST_DELEGATE_FourParams(FOnAttributeChanged, AActor, InstigatorActor, URogueAttributeComponent, OwningComp, float, NewValue, float, Delta);

You now add the delegate in your class header, which may look as follows:

UPROPERTY(BlueprintAssignable, Category = "Attributes")
FOnAttributeChanged OnHealthChanged;

You may have noticed BlueprintAssignable, this is a powerful feature of the Dynamic delegates which can be exposed to Blueprint and used on the EventGraph.

Executing Delegates

Finally, to actually trigger the callback we call OnHealthChanged_.Broadcast()_ and pass in the expected parameters.

OnHealthChanged.Broadcast(InstigatorActor, this, NewHealth, Delta);

Binding to Delegates

Binding in C++

You should *never* bind your delegates in the constructor and choose either AActor::PostInitializeComponents() or BeginPlay() to avoid issues where delegates get serialized into the Blueprint and will still be called even when you later remove the delegate binding in C++.

Since delegates are weakly referenced you often don’t need to unbind delegates when destroying objects/actors unless you want to manually stop listening/reacting to specific events.

You can bind to a delegate calling .AddDynamic(). The first parameter takes a UObject for which we can pass this. The second parameter types the address of the function (YourClass::YourFunction) which is why we pass the function with the ampersand (&) symbol which is the address operator.

void ARogueAICharacter::PostInitializeComponents()
{
  Super::PostInitializeComponents();

  AttributeComp->OnHealthChanged.AddDynamic(this, &ARogueAICharacter::OnHealthChanged);
}

The above OnHealthChanged function is declared with UFUNCTION() in the header.

UFUNCTION()
void OnHealthChanged(AActor* InstigatorActor, URogueAttributeComponent* OwningComp, float NewHealth, float Delta);

Binding in Blueprint

You can easily bind your dynamic delegates in Blueprint. When implemented on an ActorComponent as in the example below you can select the Component in the outliner and click the “+” symbol in its details panel. This creates the Delegate on the EventGraph and is already bound for us.

You can also manually bind the delegates via the EventGraph (eg. binding to another Actor’s delegates).

Note: Dynamic delegates are less performant than non-dynamic (seen below) variants. It’s therefore advisable to only use this type when you want to expose it to Blueprint.

C++ Delegates

Macro: DECLARE_DELEGATE, DECLARE_DELEGATE_OneParam

When used only in C++ we can define delegates with an unspecified amount of parameters. In the following example, we’ll use a more complex use case which is asynchronously loading game assets.

The StreamableManager of Unreal defines a FStreamableDelegate.

DECLARE_DELEGATE(FStreamableDelegate);

This doesn’t specify any parameters yet and lets us define what we wish to pass along in our own game code.

The following is taken from RogueGameModeBase in the ActionRoguelike project (link to code). We asynchronously load the data of an enemy Blueprint to spawn them once the load has finished.

if (UAssetManager* Manager = UAssetManager::GetIfValid())
{
  // Primary Id is part of AssetManager, we grab one from a DataTable
  FPrimaryAssetId MonsterId = SelectedMonsterRow->MonsterId;

  TArray<FName> Bundles;

  // A very different syntax, we create a delegate via CreateUObject and pass in the parameters we want to use once loading has completed several frames or seconds later. (In this case the MonsterId is the asset we are loading via LoadPrimaryAsset and Locations[0] is the desired spawn location once loaded)
  FStreamableDelegate Delegate = FStreamableDelegate::CreateUObject(this, &ARogueGameModeBase::OnMonsterLoaded, MonsterId, Locations[0]);

  // Requests the load in Asset Manager on the MonsterId (first param) and passes in the Delegate we just created
  Manager->LoadPrimaryAsset(MonsterId, Bundles, Delegate);
}

In the example above we create a new Delegate variable and fill it with variables, in this case MonsterId and the first vector location from an array (Locations[0]). Once the LoadPrimaryAsset function from Unreal has finished, it will call the delegate OnMonsterLoaded with the provided parameters we passed into the CreateUObject function previously.

void ARogueGameModeBase::OnMonsterLoaded(FPrimaryAssetId LoadedId, FVector SpawnLocation)

Another example of using delegates/callbacks is with Timers. We don’t need to specify our own delegate first and can directly pass in the function address so long as it has no parameters. It’s possible to use timers with parameters as well. To learn more you can check out my blog post on Using C++ Timers.

There is a lot more to talk about, but this should provide a core understanding from which to build. There are many more variants to the macros and different ways to bind…which could be a whole article on its own.

To read more about delegates I recommend BenUI’s Intro to Delegates and Advanced Delegates in C++.

Public/Private Folders

The Unreal Engine class wizard gives you the option to add new classes in the project root to split the header and code files into /public/YourClass.h and /private/YourClass.cpp folders.

Public and private folders define which files are available to use in other modules. Generally, your header files (YourClass.h) are placed in the Public folder so other modules can gain access and the code files (YourClass.cpp) are in the Private folder. Headers that are not meant to be used directly by other modules can go into the Private folder as well.

Your primary game module doesn’t need this public/private structure if you don’t intend to have other Modules depend on it.

I recommend checking out Ari’s talk on modules for more information on Modules and how to use them.

Class Prefixes (F, A, U, E, G, T, …)

Classes in Unreal have a prefix, for example, the class ‘Actor’ is named ‘AActor’ when seen in C++. These are helpful in telling you more about the type of object. Here are a few important examples.

A. Actor derived classes (including Actor itself) have A as prefix, eg. APawn, AGameMode, AYourActorClass

U. UObject derived classes, including UBlueprintFunctionLibrary, UActorComponent and UGameplayStatics. Yes, AActor derives from UObject, but it overrides it with its own A prefix.

F. Structs, like FHitResult, FVector, FRotator, and your own structs should start with F.

E. The convention for enum types. (EEnvQueryStatus, EConstraintType, …)

G. “globals” for example, GEngine->AddOnscreenDebugMessage() where GEngine is global and can be accessed anywhere. Not very common in your use within gameplay programming itself though.

T. Template classes, like TSubclassOf (class derived from T, which can be almost anything), TArray (lists), TMap (dictionaries) etc. classes that can accept multiple classes. Examples:

// A list of strings.
TArray<FString> MyStrings;

// A list of actors
TArray<AActor*> MyActors;

// Can be assigned with a CLASS (not an instance of an actor) that is either a GameMode class or derived from GameMode.
TSubclassOf<AGameMode> SubclassOfActor;

Mike Fricker (Lead Technical Director) explained the origins of “F” Prefix:

“The ‘F’ prefix actually stands for “Float” (as in Floating Point.)”

“Tim Sweeney wrote the original “FVector” class along with many of the original math classes, and the ‘F’ prefix was useful to distinguish from math constructs that would support either integers or doubles, even before such classes were written. Much of the engine code dealt with floating-point values, so the pattern spread quickly to other new engine classes at the time, then eventually became standard everywhere.”

“This was in the mid-nineties sometime. Even though most of Unreal Engine has been rewritten a few times over since then, some of the original math classes still resemble their Unreal 1 counterparts, and certain idioms remain part of Epic’s coding standard today.”

Project Prefixes

Projects in Unreal should use their own (unique) prefix to signify their origin. For example, all classes in Unreal Tournament use “UT” (AUTActor, UUTAbility), and Fortnite uses “Fort” prefix (AFortActor, UFortAbility, etc).

In the many code examples in this guide, I used “Rogue” as the prefix. The code examples in this guide are taken from the Action Roguelike project.

Common Engine Types

Besides the standard types like float, int32, bool, which I won’t cover as there is nothing too special to them within Unreal Engine - Unreal has built-in classes to handle very common logic that you will use a lot throughout your programming. Here are a few of the most commonly seen types from Unreal that you will use. Luckily the official documentation has some information on these types, so I will be referring to that a lot.

Ints are special in that you are not supposed to use “int” in serialized UProperties as the size of int can change per platform. That’s why Unreal uses its own sized int16, int32, uint16, etc. - Source

FString, FName, FText

There are three types of ‘strings’ in Unreal Engine that are used for distinctly different things. It’s important to select the right type for the job or you’ll suffer later. The most common problem is using FString for UI text instead of FText, this will be a huge headache later if you plan to do any sort of localization.

FString The base representation for strings in Unreal Engine. Used often when debugging and logging information or passing raw string information between systems (such as REST APIs). Can be easily manipulated.
FName Essentially hashed strings that allow much faster comparisons between two FNames. (they don’t change once created) and are used often for look-ups such as socket names on a Skeletal Mesh and as GameplayTags.
FText Front-end text to display to the user. Can be localized into many languages. All your front-facing text should always be FText for this reason.

Here is a piece of Documentation on String handling including how to convert between the different types.

FVector, FRotator, FTransform (FQuat)

Used to specify the location, rotation, and scale of things in the World. A line trace for example needs two FVectors (Locations) to specify the start and end of the line. Every Actor has an FTransform that contains Location, Rotation, and Scale to give it a place in the world.

FVector 3-axis as XYZ where Z is up. specifies either a Location or a direction much like common Vector-math.
FRotator 3 params Pitch, Yaw and Roll to give it a rotation value.
FTransform consists of FVector (Location), FRotator (Rotation) and FVector (Scale in 3-axis).
FQuat another variable that can specify a rotation also known by its full name as Quaternion, you will mostly use FRotator in game-code however, FQuat is less used outside the engine modules although it can prevent Gimbal lock. (It’s also not exposed to Blueprint)

TArray, TMap, TSet

Basically variations of lists of objects/values. Array is a simple list that you can add/remove items to and from. TMap are dictionaries, meaning they have Keys and Values (where the Key must always be unique) eg. TMap where a bunch of Actors are mapped to unique integers. And finally, TSet which is an optimized (hashed) version of TArray, requires items in the list to be unique. Can be great for certain performance scenarios, but typically you use TArray.

TSubclassOf

Very useful for assigning classes that derive from a certain type. For example, you may expose this variable to Blueprint where a designer can assign which projectile class must be spawned.

UPROPERTY(EditAnywhere) // Expose to Blueprint
TSubclassOf<AProjectileActor> ProjectileClass; // The class to assign in Blueprint, eg. BP_MagicProjectile.

Now the designer will get a list of classes to assign that derive from ProjectileActor, making the code very dynamic and easy to change from Blueprint.

Here we use the TSubclassOf variable ProjectileClass to spawn a new instance: (link to code)

FTransform SpawnTM = FTransform(ProjRotation, HandLocation);
GetWorld()->SpawnActor<AActor>(ProjectileClass, SpawnTM, SpawnParams);

Documentation on TSubclassOf

C++ MACROS (& Unreal Property System)

The ALL CAPS preprocessor directives are used by the compiler to ‘unfold’ into (large) pieces of code. In Unreal Engine, it’s most often used by the Unreal Property System and to add boilerplate code to our class headers. These examples are all macros, but Macros can be used for a lot more than shown below.

UFUNCTION

Allows extra markup on functions, and exposes it to the Property System (Reflection) of Unreal. Commonly used to expose functions to Blueprint. Sometimes required by the engine to bind functions to delegates (eg. binding a timer to call a function).

Here is additional information in a blog post on the available keywords within UFUNCTION() and how to use them. There are a lot of function specifiers worth checking out, and BenUI does a great job of detailing what’s available.

// Can be called by Blueprint
UFUNCTION(BlueprintCallable, Category = "Action")
bool IsRunning() const;

// Can be overriden by Blueprint to override/extend behavior but cannot be called by Blueprint (only C++)
UFUNCTION(BlueprintNativeEvent, Category = "Action")
void StartAction(AActor* Instigator);

UPROPERTY

Allows marking-up variables, and exposing them to the Property System (Reflection) of Unreal. Commonly used to expose your C++ to Blueprint but it can do a lot more using this large list of property specifiers. Again, it’s worth checking out Unreal Garden’s article on UPROPERTY specifiers.

// Expose to Blueprint and allow editing of its defaults and only grant read-only access in the node graphs.
UPROPERTY(EditDefaultsOnly, BlueprintReadOnly, Category = "UI")
TSoftObjectPtr<UTexture2D> Icon;

// Mark 'replicated' to be synchronized between client and server in multiplayer.
UPROPERTY(Replicated)
URogueActionComponent* ActionComp;

GENERATED_BODY

At the top of classes and structs and used by Unreal to add boilerplate code required by the engine.

GENERATED_BODY()

USTRUCT, UCLASS, UENUM

These macros are required when defining new classes, structs, and enums in Unreal Engine. When you create your new class, this is already added for you in the Header. By default, they will be empty like UCLASS() but can be used to add additional markup to an object for example

USTRUCT(BlueprintType)
struct FMyStruct
{
}

UE_LOG (Logging)

Macro to easily log information including a category (eg. LogAI, LogGame, LogEngine) and a severity (eg. Log, Warning, Error, or Verbose) and can be an incredibly valuable tool to verify your code by printing out some data while playing your game much like PrintString in Blueprint.

// The simple logging without additional info about the context
UE_LOG(LogAI, Log, TEXT("Just a simple log print"));
// Putting actual data and numbers here is a lot more useful though!
UE_LOG(LogAI, Warning, TEXT("X went wrong in Actor %s"), *GetName()); 

The above syntax may look a bit scary. The third parameter is a string we can fill with useful data, in the above case we print the name of the object so we know in which instance this happened. The asterisk (*) before GetName() is used to convert the return value to the correct type (from FString returned by the function to Char[] for the macro). The Unreal Wiki has a lot more detailed explanation on logging.

Modules

Unreal Engine consists of a large number (1000+) of individual modules. Your game code is contained in one or multiple modules. You can place your game-specific logic in one module, and your more generic framework logic for multiple games in another to keep a separation of dependencies.

You can find examples of these code modules in your engine installation folder (eg. Epic Games\UE_5.0\Engine\Source\Runtime\AIModule) where each module has its own [YourModuleName].build.cs file to configure itself and its dependencies.

Not every module is loaded by default. When programming in C++ you sometimes need to include additional modules to access their code. One such example is AIModule that you must add to the module’s *.build.cs file in which you wish to use it before being able to access any of the AI classes it contains.

PublicDependencyModuleNames.AddRange(new string[] { "Core", "CoreUObject", "Engine", "InputCore", "AIModule", "GameplayTasks", "UMG", "GameplayTags", "OnlineSubsystem", "DeveloperSettings" });

The above is one example from ActionRoguelike.build.cs where AIModule (among several others) has been added.

You can include additional modules through the .uproject as well instead of the build file. This is where the editor will automatically add modules under AdditionalDependencies when required (such as the moment of creating a new C++ class that derives from a missing module).

Ari from Epic Games has a great talk on Modules that I recommend checking out and is linked below. I’ve added a few takeaways from his talk.

Why use modules?

Better code practices/encapsulation of functionality
Re-use code easily between projects
Only ship modules you use (eg. trim out Editor-only functionality and unused Unreal features)
Faster compilation and linking times
Better control of what gets loaded and when.

Ari from Epic Games has a great video on the subject of Modules in Unreal Engine.

Garbage Collection (Memory Management)

Unreal Engine has a built-in garbage collection that greatly reduces our need to manually manage object lifetime. You’ll still need to take some steps to ensure this goes smoothly, but it’s easier than you’d think. Garbage collection occurs every 60 seconds by default and will clean up all unreferenced objects.

When calling MyActor->DestroyActor(), the Actor will be removed from the world and prepared to be cleared from memory. To properly manage ‘reference counting’ and memory you should add UPROPERTY() to pointers in your C++. I’ll discuss that more in the section below.

It may take some time before GC kicks in and actually deletes the memory/object. You may run into this when using UMG and GetAllWidgetsOfClass. When removing a Widget from the Viewport, it will remain in memory and is still returned by that function until GC kicks in and has verified all references are cleared.

It’s important to be mindful of how many objects you are creating and deleting at runtime as Garbage Collection can easily eat up a large chunk of your frame time and cause stuttering during gameplay. There are concepts such as Object Pooling to consider.

Automatic Updating of References (Actors & ActorComponents)

References to Actors (and ActorComponents) can be automatically nulled after they get destroyed. For this to work you must mark the pointer with UPROPERTY() so it can be tracked properly.

// SInteractionComponent.h
UPROPERTY()
TObjectPtr<AActor> FocusedActor;

“Destroyed actors don’t have references to them nulled until they’re actually garbage collected. That’s what IsValid(yourobject) is used for checking.” - Ari Arnbj\örnsson

You can read more about automatic updating of references on the official docs. The thing to keep in mind is that it only works for Actor and ActorComponent derived classes.

In UE5 the behavior for automatically clearing RawPtrs / ObjectPtrs will change.

“This will be changing a bit in UE5. The GC will no longer clear UPROPERTY + RawPtr/TObjectPtr references (even for Actors) but instead mark them as garbage (MarkAsGarbage()) and not GC them. The only way to clear the memory will be to null the reference or use weak pointers.” - Ari Arnbjörnsson. I will update this post once the new behavior has been enabled by default.

TWeakObjPtr

Weak Object Pointer. This is similar to pointers like UObject*, except that we tell the engine that we don’t want to hold onto the memory or object if we are the last piece of code referencing it. UObjects are automatically destroyed and garbage collected when no code is holding a (hard) reference to it. Use weak object pointers carefully to ensure objects are GC’ed when needed.

// UGameAbility derived from UObject
TWeakObjectPtr<UGameAbility> MyReferencedAbility;

Now we don’t try to hold onto the object explicitly and it can be garbage collected safely. Before accessing the object, we must call .Get() which will attempt to retrieve the object from the internal object array and makes sure it’s valid. If it’s no longer a valid object, a nullptr is returned instead.

UGameAbility* Ability = MyReferencedAbility.Get();
if (Ability)
{
}

Class Default Object

Class Default Object is the default instance of a class in Unreal Engine. This instance is automatically created and used to quickly instantiate new instances. You can use this CDO in other ways too to avoid having to manually create and maintain an instance.

You can easily get the CDO in C++ via GetDefault. You should take care to not accidentally make changes to the CDO as this will bleed over into any new instance created for that class.

Below is one example from SaveGameSubsystem using the ‘class default object’ to access DeveloperSettings (Which can contain Project & Editor Settings to access in your game code) without first creating a new instance.

// Example from: SSaveGameSubsystem.cpp (in Initialize())

const URogueSaveGameSettings* Settings = GetDefault<URogueSaveGameSettings>();

// Access default value from class
CurrentSlotName = Settings->SaveSlotName;

Asserts (Debugging)

If you really need to be sure if something is not nullptr or a specific (if-)statement is true and want the code to tell you if it isn’t, then you can use Asserts. Asserts are great as additional checks in code where if it were to silently fail, code later down the line may fail too (which may then take a while to debug and find the origin).

Two common assertion types are check and ensure.

check(MyValue == 1); // treated as fatal error if statement is 'false'
check(MyActorPointer);

// convenient to break here when the pointer is nullptr we should investigate immediately
if (ensure(MyActorPointer)) // non-fatal, execution is allowed to continue, useful to encapsulate in if-statements
{
}

ensure() assert is great for non-fatal errors and is only triggered once per session. You can use ensureAlways() to allow the assert to trigger multiple times per session. But make sure the assert isn’t in a high-frequency code path for your own sake or you’ll be flooded with error reports.

It’s good to know that Asserts are compiled out of shipping builds by default and so it won’t negatively affect runtime performance for your end-user.

By adding these asserts you are immediately notified of the (coding) error. One tip I would give here is to only use it for potential coder mistakes and perhaps don’t use it when a piece of content isn’t assigned by a designer (having them run into asserts isn’t as useful as to them it will look like a crash (unless they have an IDE attached) or stall the editor for a bit (as a minidump is created) and not provide a valuable piece of information). For them might be better of using logs and prints on the screen to tell them what they did not set up properly. I sometimes still add in asserts for content mistakes as this is very useful in solo or small team projects.

Core Redirects

Core Redirects are a refactoring tool. They let you redirect pretty much any class, function, name, etc. after your C++ has changed via the configuration files (.ini). This can be incredibly helpful in reducing the massive headache of updating your Blueprints after a C++ change.

The official documentation (above) does a pretty good job of explaining how to set this up. It’s one of those things that’s good to know before you need it. Modern IDEs with proper Unreal Engine support such as JetBrains Rider even have support for creating these redirects when you refactor your Blueprint exposed C++ code.

Closing

I hope this article provided you with some new insight into C++ and how it’s used in Unreal Engine. This article has mainly focused on the uncommon aspects that are unique to Unreal Engine and how they apply within that context rather than C++ or programming in general.

Why stop here? Dive deeper into the world of C++ and Unreal Engine with my industry proven course! Used by thousands of Indie & AAA developers around the world!

As always, follow me on Twitter/X for more Unreal Engine insights!

On The Horizon…

Things that didn’t quite make it in yet or require a more detailed explanation in the current sections. Leave your suggestions in the comments!

Unreal Header Tool / Unreal Build Tool (”Unreal Build System”)
Project Structure (Game, Engine, build.cs, Target, binaries, .uproject)
Including other classes (and how to find their path)
Hot Reloading & Live Coding in UE5.0
IDE recommendations and setup
Timers, Async actions (Latent), Multi-threading
Game Class Hierarchy and most commonly used classes (primer).
virtual/override keywords. (”Virtual Functions and Polymorphism”)
‘const’ keyword & const correctness
Operator Overloading (examples of where Unreal has done so, eg. with FString when used with logging)

References & Further Reading

Unreal Engine Game Optimization on a Budget

2022-11-03T00:00:00+00:00

For the JetBrains GameDev Day, I was invited to give a talk about Unreal Engine. I decided to create one for game optimization in Unreal Engine. It’s a topic I’ve been spending a lot of time with recently and wanted to share some tips and tricks. The slot of 45 minutes had only room for so much…so expect more performance-oriented blog posts from me soon!

Certain rendering features are not supported by Unreal Engine 5’s Nanite Virtualized Geometry. These limitations are called out in the individual sections.

Talk Motivation and Contents

“on a budget” from the title of the talk refers to cheap and easy-to-apply optimizations for a wide range of projects. I won’t be talking about highly complex custom systems or engine modifications.

I recommend you watch the full presentation, the summarized version contains only brief notes with each slide.

Profiling Preparations

Before you can start profiling make sure you are set up. Here is a brief checklist of things to keep in mind when profiling. Disabling vsync and other framerate features. Having unbaked lights can drastically influence performance and muddy your results while profiling as slower render paths are used.

Ideally, when profiling with tools such as Unreal Insights you package your game rather than running from within the editor. Besides getting very different memory usage and more hitching level streaming, your frame timings may be quite different in an editor build as well. Running the game in ‘Standalone’ is still very convenient, make sure your Editor viewport has ‘Realtime’ disabled and is minimized.

r.vsync 0
t.maxfps 0
SmoothFrameRate = false (Project Settings)
Lighting Built & MapCheck Errors resolved
Packaged Game build
- Editor ‘Standalone’ is convenient (however memory and certain timings may be inaccurate)

Find the Bottleneck

You should not be blindly optimizing code in your project. Instead, make sure you measure and find your bottleneck. With Game, Render, and GPU all running asynchronously from each other, it’s important to know which is your bottleneck or you are not going to see any meaningful performance gains.

Game Thread / Render Thread / GPU
- Unreal Insights
- ProfileGPU + r.RHISetGPUCaptureOptions 1
- stat unitgraph
- stat detailed
- r.screenpercentage 20
- pause (Freeze Game Thread)
Memory & Loading
- Unreal Insights (-trace=memory,loadtime,file)
- memreport -full
- loadtimes.dumpreport

Unreal Insights

Unreal Insights is the new flagship profiling tool that came in late Unreal Engine 4 and is still seeing major improvements in 5.0 with more advanced Memory profiling for example.

Detailed Insights into the frame timings:
- CPU/GPU
- Memory
- File Loading
- Threading
Drill down on a single frame or session

Trace Channels

Some common trace channels to use on your game executable or in Standalone. statnamedevents argument provides more detailed information on object names.

-trace=log,cpu,gpu,frame,bookmark,loadtime,file,memory,net
-statnamedevents

Full list of Trace Channels

Bookmarks

Bookmarks add contextual information about changes and transitions that happens during the profiling session. This includes streaming in new levels, executing console commands, starting a cinematic sequence, etc. You can easily add new bookmarks to your own game code to add more context. While profiling use bookmark trace channel.

Bookmarks for context and transitions

GC (Garbage Collection)
Sequencer Start
Level streaming (Start/Complete)
Console Commands

C++: TRACE_BOOKMARK(Format, Args)

Add new ‘stat’ profiling

For your C++ game code, it can be valuable to include additional profiling details by adding your own stat tracing. By default your blueprint functions will only show up as ‘Blueprint Time’, adding custom profiling will add more details on where this time is spent if that Blueprint called into your C++ game code. This is relatively straightforward to do and is detailed in my blog post below.

Add profiling detail to your game code
Track as “stat YourCategory” in the viewport or via Insights.

I previously wrote about this topic before in Profiling Stats (Stat Commands).

Unreal Insight Tips

It may prove valuable to run some commands during a profiling session to see how this affects your frame in great detail. Especially as some features are first processed on the Game Thread, and may then get handled by the Render Thread later that frame such as Skeletal Meshes.

Run commands to compare during the session (Shows as Bookmark)
- r.ScreenPercentage 20
- pause
Use only necessary Trace Channels for lower overhead
Add custom Bookmarks for gameplay context

Memreport -full

memreport -full provides a great insight into your memory usage and whether assets are loaded unintentionally. Drilling down into a specific asset type with obj list class= will provide further details on the most expensive assets. You can use this information to know which assets to optimize and review whether they should be in memory at this point at all.

memreport -full
- Runs a number of individual commands for memory profiling
obj list class=
- Example: obj list class=AnimSequence
Only in Packaged Builds for accurate results
- Example: AnimSequence is twice as large in editor builds.

DumpTicks

DumpTicks is a great first step to optimizing Game Thread performance. Dump all ticking objects to review what should be ticking or whether they can be disabled.

dumpticks / dumpticks grouped
- Outputs all Actor and Component Ticks
listtimers
- Run on low frequency
- avoid heavy load (stuttering)
stat uobjects
Disable/Reduce further with Significance Manager
- More on that later…

Collision & Physics

By default meshes in your scenes will have both physics and collision enabled. This can be wasteful if you don’t use physics and especially if a lot of them are moving around. Player movement only requires ‘QueryOnly’ on objects and so it’s possible you are wasting CPU and memory on loading and maintaining physics bodies that remain unused.

Unreal configured to just work out of the box.
- “Collision Enabled” => Physics + Query
- Most things require just ‘QueryOnly’
Disable Components that players can’t reach or interact with.
Profiling
- stat physics
- stat collision
- obj list class=BodySetup
- show CollisionPawn
- show CollisionVisibility

Tip: Landscape Components may use lower collision MIPs to reduce memory overhead and collision complexity.

Moving SceneComponents

Moving game objects with a lot of SceneComponents is far from free. Especially if you use default settings. There are some easy optimizations to apply which can greatly reduce CPU cost.

Move/Rotate only once per frame
Disable Collision & GenerateOverlaps=False
AutoManageAttachment
- Audio & Niagara
Profiling
- stat component

two large yellow ‘MoveComponent’ sections due to SetActorLocation, and SetActorRotation separate calls.

Component Bounds

While not expensive on a per-component basis, with tons of PrimitiveComponents in a single Blueprint this can add up. Be considerate when re-using the parent’s bounds as the child may be outside the bounds when animating the object which will cause render popping as the camera starts to look away.

UseAttachParentBound=True
- Skips “CalcBounds”
show Bounds or showflag.bounds 1

Significance Manager

Significance Manager provides a bare-bones framework to calculate a ‘significance’ value for gameplay objects and scale down their features on the fly. You might reduce the tickrate on distant AI agents, or disable animation entirely until they get close enough. This system will be highly specific to your game and will be especially helpful for non-linear experiences where you can’t rely on trigger volumes to disable these gameplay objects.

Significance Manager is often only briefly mentioned but can be challenging to get started with. I’m currently writing a blog post and have some example code on my GitHub. The implementation can be pretty straightforward depending on your needs, so it’s a worthwhile system to explore!

Scale down fidelity based on game-specific logic
- Distance To
- Max number of objects in full fidelity (‘buckets’)
Calculates ‘significance value’ to scale-down game objects.
- Examples: NPCs, puzzle Actors, Vehicles, other Players
Reduce/Cull:
- Tick rate
- Traces / Queries
- Animation updates (SKs)
- Audio/Particle playback or update rate
Profiling
- ShowDebug SignificanceManager
  - sigman.filtertag
- stat significancemanager
Examples
- GitHub.com/tomlooman/ActionRoguelike
- USSignificanceComponent.h

Occlusion Culling

Occlusion Culling is often a costly part of your frame and something that may be difficult to tackle without knowing what’s adding this cost and the tools available to optimize. The easiest is to reduce the number of considered primitives. This is where level streaming, HLOD, and distance culling can be a great help.

Note: Nanite in UE5 has an entirely different occlusion culling system (Two-pass HZB) running on the GPU. This no longer queries the GPU occlusion queries on the N+1 frame. Non-nanite geometry in UE5 can still use this ‘old’ behavior.

Frustum Culling and Occlusion Queries
GPU query results polled in next frame
HLOD can greatly reduce occlusion cost (See below)
Profiling
- r.visualizeoccludedprimitives 1
- stat initviews

modular mesh building, many occluded parts

Single HLOD generated for static geometry.

RenderDoc: Occlusion Query Results

RenderDoc is a fantastic tool to help dissect and understand how Unreal is rendering your frame. In this example, I use the DepthTest to visualize the occlusion query result. You may find you are sending hundreds of queries with boxes of only a few pixels in size that had no chance of ever succeeding or the tiny mesh even being relevant to the frame once rendered.

DepthTest Overlay in RenderDoc
Easily find ‘wasteful’ queries on tiny/far objects

Note: As mentioned in the previous section. Nanite does not issue individual GPU occlusion queries. This visualization can still be used for non-Nanite meshes.

Distance Culling

Distance Culling is an effective way to reduce the cost of occlusion. Small props can be distance culled using a per-instance setting or using Distance Cull Volume to map an object Size with cull Distance. Objects culled this way don’t need GPU occlusion queries, which can significantly cut cost.

Distance Culling is not supported for Nanite. Non-nanite geometry such as translucent meshes still do.

PrimitiveComponent: Max/Min Draw Distance
- Light Cones, Fog Volumes, Blueprint Components
Distance Cull Volume
- Maps object “Size” with “CullDistance”
- Reduce Occlusion Query cost
Profiling
- showflag.distanceculledprimitives 1
- stat initviews

Min/Max Draw Distance

MinDrawDistance may be useful to cull up-close translucent surfaces that cause a lot of overdraw and don’t necessarily contribute a lot to your scene (eg. it might even fade out when near the camera in the material, this still requires the pixel to be evaluated).

Example: Light Cones
Vis: Shader Complexity
- Pixel Overdraw
DistanceCullFade
- Blends 0-1, 1-0

Min/Max Draw Distance is not supported for Nanite.

() Default scene with many overlapping surfaces

() Min+Max Draw distance Set

()

FreezeRendering

Freeze the occlusion culling to see whether your scene is properly occluded or if certain Actors are still rendered unexpectedly.

FreezeRendering does not work with Nanite.

‘FreezeRendering’ + ; (semi-colon) to fly with DebugCamera
Verify occlusion is working as expected

() Player looking toward building

FreezeRendering enabled

Light Culling (Stationary & Movable)

Lights can still add considerable cost to your render thread even if they aren’t contributing much or anything at all. Fading them out at range can help, make sure they don’t more or change unless they absolutely have to. Avoid overlapping too many stationary lights (Max 4) or one will be forced Movable, adding considerable cost to your frame.

Automatic ScreenSize culling is not strict enough
- MinScreenRadiusForLights (0.03)
Cull earlier case-by-case
- MaxDrawDistance
- MaxDistanceFadeRange
Profiling
- Show > LightComplexity (Alt+7)
- Show > StationaryLightOverlap
- ToggleLight

Too many overlapping stationary lights

Level Streaming

Level Streaming should be considered early in the level design to avoid headaches later. This includes splitting up level sections into sublevels and thinking about good moments to load/unload these levels.

Besides reducing the memory load potentially significantly, it can help occlusion cost a lot by keeping more levels hidden (or unloaded entirely) for as long as possible. bShouldBeVisible can be used in C++/Blueprint to hide the level. This keeps it in memory but out of consideration for occlusion etc.

Streaming Volumes vs. Manual Load/Unload
- Camera Location based (caution: third person view and cinematic shots)
- Cannot combine both on a specific sublevel, can mix within the game
Profiling
- stat levels
- Loadtimes.dumpreport (+ loadtimes.reset)
- Unreal Insight
  - Look for level load & “GC” bookmarks
  - loadtime,file trace channels
Performance Impacts
- Initial level load time
- Occlusion cost
- Memory
Options: Load, LoadNotVisible, LoadVisible
- Keep in memory while hiding to aid the renderer
Consider streaming early in Level Design!
- Splitting into multiple ULevels
- Line of sight, natural corridors and points of no return

Animation

The following Animation Optimization doc page contains more information about the tips presented in the talk.

Fast Path

Allow ‘Fast Path’ by moving Computations out of AnimGraph (into EventGraph)
- Use WarnAboutBlueprintUsage to get warnings in AnimGraph
Profiling
- stat anim

Quick Wins

Skeletal Meshes add a chunky amount of processing to your CPU threads, there are some easy wins to look into when you have many SKs alive at a time, especially if they don’t always contribute to the frame.

Update Rate Optimization (URO) for distant SkelMeshes
VisibilityBasedAnimTickOption (per-class and config variable in DefaultEngine.ini)
- OnlyTickPoseWhenRendered
- AlwaysTickPoseAndRefreshBones
- …
More Bools!
- bRenderAsStatic
- bPauseAnims
- bNoSkeletonUpdate

Animation Compression Library (ACL)

This animation compression library has cut the memory size for animations in half in the most recent title I worked with. Far greater decompression speeds can improve loading times as well. It works independently from Oodle (below).

The ACL plugin is built in with Unreal Engine 5.3+. Existing projects that migrated (to 5.3+) may still need to manually update their animations to compress using ACL.

ACL Plugin (by Nicholas Frechette)
Compression speed-up (from minutes to seconds!, 56x faster)
Decompression Speed (8.4x faster)
Memory Size (cut in half across the game)
Used in Fortnite and other AAA titles

Oodle Data & Oodle Texture

Oodle has been providing incredible compression for years, and more recently ships with Unreal out of the box. It can greatly improve game packaged sizes and with faster decompression, it can improve load times as well!

RDO (Rate Distortion Optimization) Compression
- Significant gains in compression compared to the default
- Takes longer to compress (off by default in-editor)
RDO Works with Oodle Data by ‘preparing’ the texture data

SynthBenchmark

Scalability is a critical concept to allow your game to run on a wide range of devices. The hardware benchmark tool helps you evaluate the power of the machine the game is running on and apply a base layer of scalability (Low to Epic in the available categories such as Shadow Rendering, View Distance, etc.).

I wrote a blog post about applying Hardware Benchmark for default scalability.

Run CPU/GPU benchmark and apply Scalability Settings
Returns “score” with 100 baseline for Avg. CPU/GPU

Shadow Proxies

Using Shadow Proxies is a manual process to reduce the often significant shadow rendering cost in your scene. You might have beautiful and modular buildings that cause a ton of draw calls and potentially millions of triangles for just shadow depth rendering. A big downside of this system is the manual and destructive workflow. I wanted to point this trick out regardless and with UE5’s geometry script, it may be only a few nodes away from generating simplified mesh proxies on the fly!

Your Mileage may very greatly for Nanite geometry. Requires additional testing is this is still a viable trick for certain Nanite geometry such as Foliage.

Single low-poly silhouette mesh
- RenderMainPass=False
Bespoke mesh or using built-in Mesh Tools
- ‘Merge Actors’ (Right-Click assets in level)
- UE5 Geometry Script
Profiling
- ‘ShadowDepths’ in Insights &
- ProfileGPU + r.RHISetGPUCaptureOptions 1

SizeMap (Disk & Memory)

SizeMap is a valuable tool to quickly find and address hard references in your content. This is an often hidden danger that can add considerable development cost and the end of your project once you’re struggling with memory and load times.

Find unexpected references and bloated content
Use on Blueprints and (sub)Levels early and often

Check out Mark Craig’s recent talk on the hidden danger of Asset Dependency Chains.

Statistics Window

I found myself often using this panel to investigate opportunities for memory and total map sizes. Especially Landscape assets will show up as huge bloated assets. Reducing collision complexity and deleting unseen Landscape Components can help a lot here. You may find certain asset variants used only once in the level, and can consider swapping these out to keep them out of memory and your load screen entirely!

Stats on current level
- Primitive Stats
- Texture Stats
Tip: Shift-click for secondary sort.
- Sort ‘Count’ + ‘Tris’ or ‘Size’ (Find large assets used only once)

Useful Console Commands

ToggleForceDefaultMaterial (Non-Nanite)
- Will show significant changes to BasePass cost as everything can render with the same shader. You can use this to compare your scene and see how your shaders are affecting it.
stat Dumphitches
- profiling hitches can be problematic, this is a first step in finding expensive function calls when a hitch does occur
stat none (clear all categories on screen)
r.ForceLODShadow X (CSM & Non-Nanite) or r.Shadow.NaniteLODBias (VSM + Nanite)
- For low-end platforms, the forced shadow LOD can be one of those easy to do tricks to significantly cut down on triangles rendered for shadows with cascaded shadow mapping. Make sure you have good LODs! Virtual Shadow Mapping (VSM) has a better LOD Bias (r.Shadow.NaniteLODBias) option available instead of a forced LOD when using Nanite as well.

Closing

Eager to learn more about Game Optimization in Unreal Engine? I got you covered with a complete course to guide you and your team through the entire process of performance optimization for games. It coveres a wide range of topics including Unreal Insights and specific CPU, GPU and memory optimizations.

To stay up-to-date with any new optimization articles sign up for my Newsletter below and follow me on Twitter!

Tom Looman

Adding Counters & Traces to Unreal Insights & Stats System

Types of Trace Metrics

Counters

Counters For Unreal Insights

Counters for Stats System

Cycle Counters

Cycle Counters for Stats System

Showing metrics in-game (Stat Commands)

Adding new profiling metrics to your game

Named Events

Closing

References

Setting up Rider for C++ and Unreal Engine

Required Software

Installing JetBrains Rider

Installing Visual Studio Build Tools

Required Individual Components

Creating a C++ Unreal Engine Project

Opening your C++ Project in Rider

Compile your project

Configuring Rider as the “Source code editor” in Unreal Editor

Installing RiderLink

Windows Defender Exclusions

Errors & Troubleshooting

Error: No valid Visual C++ toolchain was found

Error: No available Windows SDKs found

Error: Install a version of .NET Framework SDK at 4.6.0 or higher

Installing Editor Symbols for debugging (Optional)

Recommended Rider Settings

Indexing Plugins

Preference: Reduce Parameter Popup Delay

Preference: Turn off “Reader Mode”

Preference: Turn off “Code Folding” on Imports

Preference: Turn off Full Line completion suggestions

Preference: Turn off Hard Wrap Visual Guide

Preference: Removing buttons from the Toolbar

Setting your HotKeys & Theme

Closing

C++ Course Completely Rebuilt for Unreal Engine 5 (Early Access)

Free Upgrade

What’s Changed?

UE4 Course Access

Price Adjustment

Early Impressions from Students

What’s Next?

Unreal Engine 5.7 Performance Highlights

Nanite

Nanite Foliage and Skinning (Experimental)

Lumen

MegaLights

Custom HLODs

Windows (Mouse/Cursor)

SMAA (Experimental)

Chaos Physics

Chaos Cloth

Chaos Visual Debugger

Slate UI

Garbage Collection

Multi-threading

Gameplay

Core

World Building

Windows

Lighting

Materials and Shaders

Niagara

PCG - FastGeometry

Mass Runtime

Rendering

FBIK and Retargeting Performance

State Tree Runtime

AI Navigation

Audio

Blueprint Runtime

Mass

Iris Networking

Procedural

Platform Android

Landscape Editing