Performance

Configurable high performing particles

Particle Playground and its extension components runs in an asynchronous environment to make the most out of the hardware it runs on. By default, all particle calculation gets bundled into separate threads distributed over the available CPU. This can be configured seamlessly (in the middle of run-time) to behave at specific project- or run-time needs, i.e. how calculations should distribute and how many CPUs should be utilized.

Inside the calculation loop the Playground Manager is iterating through particle data stored within built-in arrays, which naturally gives a great boost to the time spent on each particle.

playground-profiler

GC Allocations

The memory allocations is dependent on the Multithreading setup in Particle Playground. The Thread Method can be set to No Threads (everything runs on Main-Thread), One Per System (one thread per particle system), One For All (one asynchronous operation for all particle systems) or Automatic (bundled calculation calls). The Automatic mode will bundle calculation calls onto the available CPUs/Max Threads. You will therefore experience different results depending on which hardware you run, how many particle systems you have currently simulating and how the Multithreading is setup. This means that you will not always see the same results in Editor as on device, as you may have more CPUs available in the development environment.

Below you can see charts on the performance using the different thread methods with 1, 10 and 100 particle systems simulated at the same time. Each thread along with a standard setup particle calculation cycle will generate 128 bytes. Test is made on Particle Playground 3.0.1 using Unity 5.3.

Particle Thread Method - GC Allocations

1x particle system (2 CPUs)

No Threads: 24 bytes
One Per System: 128 bytes
One For All: 104 bytes
Automatic: 128 bytes

10x particle systems (2 CPUs)

No Threads: 240 bytes
One Per System: 1300 bytes
One For All: 104 bytes
Automatic: 416 bytes

100x particle systems (4 CPUs)

No Threads: 2300 bytes
One Per System: 12500 bytes
One For All: 104 bytes
Automatic: 1100 bytes

CPU Time

The time it takes to calculate a Particle Playground system is highly depending on the CPU, the complexity of the particle system and the settings in Particle Playground’s Multithreading options. In general the best performance is gained using bundled calculation calls distributed over several CPUs (Particle Thread Method: Automatic). This is why you need to weigh in generating some memory allocations for the threads in order to obtain a fluid experience and not overflowing the Main-Thread with calculation calls where most MonoBehaviour logic runs.

The chart below is explaining how well you relieve the Main-Thread when Particle Playground runs asynchronously. Test is made on Particle Playground 3.0.1 using Unity 5.3.

Particle Thread Method - CPU Time

1x particle system (2 CPUs)

No Threads: 0.2 milliseconds
One Per System: 0.01 milliseconds
One For All: 0.01 milliseconds
Automatic: 0.01 milliseconds

10x particle systems (2 CPUs)

No Threads: 0.75 milliseconds
One Per System: 0.05 milliseconds
One For All: 0.06 milliseconds
Automatic: 0.04 milliseconds

100x particle systems (4 CPUs)

No Threads: 7.0 milliseconds
One Per System: 0.35 milliseconds
One For All: 0.3 milliseconds
Automatic: 0.27 milliseconds

Configuring Playground’s Performance

playground-multithreading-configuration

Particle Playground is centered around multithreaded solutions to give good performance on a wide variety of devices. Threads enables CPUs (Central Processing Units) to manage and process multiple requests of data at a time. This means a thread will be processed asynchronously alongside other routines. Often you’ll see that a CPU has several cores or hyper-threading techniques. Where available, running data on a second thread in Unity will relieve the main-thread where most MonoBehaviour and logic runs.

On the Playground Manager in the Advanced section you can configure how Particle Playground should distribute calculations onto the threads.

Thread Pool Method

There are two versions of thread pooling, one that is using the .NET Thread Pool (previous versions) and the Playground Pool (introduced in version 3). It is recommended to use the Playground Pool for increased performance and to reduce memory allocations. Max Threads determine the number of pooled threads when Particle Thread Method is set to Automatic.

Note when running in a WebGL environment multithreading is not available where Particle Playground will simulate everything on the Main-Thread.

Thread Pool
Uses the default .NET Thread Pool. This generates 700 bytes per created thread.

Playground Pool
Uses a self-managed thread pool. This generates 128 bytes per created thread. This is the default and recommended method to use.

Particle Thread Method

The Particle Thread Method will determine how particle calculations distribute over the threads. Every Particle Playground system also has an override to distribute differently (Advanced > Misc > Particle Thread Method).

No Threads
The particle simulation will run on the Main-Thread. This is only recommended for debug purposes or when distributing to a specific platform which does not have more than one CPU and no techniques such as hyperthreading is available.

One Per System
A separate thread will be created for each and every particle system simulated within the scene.

One For All
All particle systems within the scene will calculate on the same thread (asynchronously).

Automatic
By default all particle system calculations are running in automatic bundled thread calls. This is the preferred setting to make the most out of the hardware while not overflowing the system with threads and memory allocations.

Turbulence Thread Method

Even though the turbulence field simulation is extremely fast in Particle Playground, having a high amount of particles can make it benefit of running on its own threads.

Inside Particle Calculation
The turbulence is calculated inside the Particle Playground system’s calculation. Where this is depends on the Particle Thread Method. No extra thread is created to simulate the turbulence. This is the default value but you may want to try changing this while profiling the performance if this could be a bottleneck for certain particle systems within your scene.

One Per System
A separate thread is created for each Particle Playground system which utilizes turbulence. This can be preferable if you have few heavy particle systems within the scene.

One For All
All Particle Playground systems utilizing turbulence will bundle their turbulence calculation into the same thread. This can be preferable if you have many smaller particle systems using turbulence.

Skinned Mesh Thread Method

Extracting particle birth positions from a skinned mesh can be a heavy task if the mesh has a high vertex count. When not using a procedural mesh the vertex positions are cached and then calculated through a transform matrix from the skinned mesh’s bones.

Inside Particle Calculation
The skinned mesh will be calculated inside the particle calculation. Where this is depends on your setting in Particle Thread Method.

One Per System
A separate thread is created for each Particle Playground system which utilizes skinned meshes. This is preferable if you have few Particle Playground systems which birth particles onto high-poly skinned meshes within your scene.

One For All
All Particle Playground systems utilizing skinned meshes will bundle their skinned mesh calculation into the same thread. This can be preferable if you have many particle systems using low-poly skinned meshes.

Max Threads

When having Particle Thread Method set to Automatic the Max Threads will be available to set. This determines the maximum number of pooled threads. If you for example want to bundle calculation calls but only use two threads for the Particle Playground systems you would set this value to 2. The lower this value is (default 8) the faster it is to iterate through the pool and has an affect on CPU time.