While Starling mimics the classic display list of Flash, what it does behind the scenes is quite different. To achieve the best possible performance, you have to understand some key concepts of its architecture. Here is a list of best practices you can follow to have your game run as fast as possible.
The most important rule right at the beginning: always create a release build when you test performance. Unlike conventional Flash projects, a release build makes a huge difference when you use a Stage3D framework. The speed difference is immense; depending on the platform you're working on, you can easily get a multiple of the framerate of a debug build.
Be sure that Starling is indeed using the GPU for rendering. That's easy to check: if Starling.current.context.driverInfo contains the string “Software”, then Stage3D is in software fallback mode, otherwise it's using the GPU.
Furthermore, some mobile devices can be run in a “Battery Saving Mode”. Be sure to turn that off when making performance tests.
Adobe published an amazing performance profiling tool called “Scout”. It's very lightweight and easy to use: just compile your game with the “-advanced-telemetry” flag, then open up Scout before launching your game. It works even on mobile devices!
By default, if you use a Loader to load a PNG or JPEG image, the image data is decoded when it's used to create a Texture. This happens on the main thread and can cause your app to block when creating textures from loaded images. Try setting the image decoding policy flag to ON_LOAD, which will decode the image in the Loader's background thread. Starling's AssetManager already uses this ON_LOAD technique for images it loads.
loaderContext.imageDecodingPolicy = ImageDecodingPolicy.ON_LOAD; loader.load(url, loaderContext);
Find out more about this topic in the LoaderContext documentation.
ActionScript 3 contains a few 'gotcha's that degrade performance and are best avoided. Here are the most important you should know about:
When working with loops that are repeated very often or are deeply nested, it's better to avoid “for each”; the classic “for i” yields a better performance. Furthermore, beware that the loop condition is executed once per loop, so it's faster to save it into an extra variable.
// slowish: for each (var item:Object in array) { ... } // better: for (var i:int=0; i<array.length; ++i) { ... } // fastest: for (var i:int=0, l:int=array.length; i<l; ++i) { ... }
Avoid creating a lot of temporary objects. They take up memory and need to be cleaned up by the Garbage Collector, which might cause small hiccups when it's running.
// bad: for (var i:int=0; i<10; ++i) { var point:Point = new Point(i, 2*i); doSomethingWith(point); } // better: var point:Point = new Point(); for (var i:int=0; i<10; ++i) { point.setTo(i, 2*i); doSomethingWith(point); }
When you retrieve an element of a vector or array, be careful: when the element index is the result of a calculation, always cast it to int. For some reason, AS3 can access the element faster, that way.
// bad: var element:Object = array[10*x]; // better: (even though 'x' is an integer!) var element:Object = array[int(10*x)];
As you know, Starling uses Stage3D to render all visible objects. This means that all drawing is done by the GPU.
Now, Starling could send one quad after the other to the GPU, drawing one by one. In fact, this is how the very first Starling release worked. However, for optimal performance, GPUs prefer to get a huge pile of data and draw all of it at once.
That's why newer Starling versions batch as many quads together as possible before sending them to the GPU. However, it can only batch quads that have similar properties. Whenever a quad with a different “state” is encountered, a “state change” occurs, and the previously batched quads are drawn.
These are those crucial properties that make up a state:
If you set up your scene in a way that creates as little state changes as possible, your rendering performance will profit immensely.
Some mobile hardware (e.g. the 1st iPad) has a hard time “tinting” textures, that is:
alpha is not one)image.color to something else than white)For this reason, Starling optimizes the rendering code of untinted images. The downside: switching between tinted and untinted objects will cause a state change. Keep that in mind when you change an image's color or alpha value.
There's a simple trick to avoid the state changes then: just set the alpha value of your root object to “0.999” or a similar value. Since the alpha value propagates down to the children on rendering, Starling will now treat every object as tinted, and no more state changes (at least not from the color or alpha properties) will be triggered.
Starling's statistics display, which you can activate by enabling starling.showStats, shows you the number of draw calls that are executed per frame (third line: DRW). The more state changes you have, the higher this number will be. Your target should always be to keep it as low as possible. The following tips will show you how.
N.B. Starling explicitly decrements the draw count displayed to take into account the stats display being used.
To know how to minimize state changes, you need to know the order in which Starling processes your objects.
Like Flash, Starling uses the Painter's algorithm to process the display list. This means that it draws your scene like a painter would do it: starting at the object at the bottom layer (e.g. the background image) and moving upwards, drawing new objects on top of previous ones.
If you'd set up such a scene in Starling, you could create three sprites: one containing the mountain range in the distance, one with the ground, and one with the vegetation. The mountain range would be at the bottom (index 0), the vegetation at the top (index 2). Each sprite would contain images that contain the actual objects.
On rendering, Starling would start at the left with “Mountain 1” and continue towards the right, until it reaches “Tree 2”. Now, if all those objects have a different state, it would have to make 6 draw calls. If you load each object's texture from a separate Bitmap, this is what will happen.
Sprite object for example, based on properties such as x, y, alpha etc. The method accepts a compare function which means you can sort objects based on any criteria you wish! That's one of the reasons why texture atlases are so important. If you load all those textures from one single atlas, Starling will be able to draw all objects at once! (At least if the other properties listed above do not change.)
Here, each image uses the same atlas (depicted by all nodes having the same color). The consequence of this is that you should always use an atlas for your textures.
Sometimes, though, not all of your textures will fit into a single atlas. An atlas should not be bigger than 2048×2048 pixels (this is the maximum texture size on some mobile hardware), so you'll run out of space sooner or later. But this is no problem — as long as you arrange your textures in a smart way.
Both those examples use two atlases (again, one color per atlas). But while the display list on the left will force a state change for each object, the version on the right will be able to draw all objects in just two batches.
By minimizing state changes, you have already done a lot for the performance of your game. However, Starling still needs to iterate over all your objects, check their state, and then upload their data to the GPU — on each frame!
This is where the next optimization step comes into play. If there's a part of your game's geometry that is static and does not (or only rarely) change, call the flatten method on that sprite. Starling will preprocess the children and upload their data to the GPU. On each of the following frames, it will now be able to draw them right away, without any additional CPU processing, and without having to upload new data to the GPU.
This is a powerful feature that can potentially reduce the burden on the CPU immensely. Just keep in mind that even flattened sprites suffer from state changes: if the geometry of a flattened sprite contains different render states, it will still be drawn in multiple steps.
Flattened sprites are very fast and easy to use. However, they still have some overhead:
ADDED and ADDED_TO_STAGE events, which can be some overhead if there are lots of children.
To get rid of these limitations as well, you can go down to the low-level class that Starling uses for all the batching internally: QuadBatch. It works like this:
var quadBatch:QuadBatch = new QuadBatch(); var image:Image = new Image(texture); quadBatch.addImage(image); for (var i:int=0; i<100; ++i) { image.x += 10; quadBatch.addImage(image); }
Did you notice? You can add the same image as often as you want! Furthermore, adding it won't cause any event dispatching. As expected, this has some downsides, though:
Image, Quad, or QuadBatch class.For these reasons, it's only suitable for very specific use-cases (the bitmap font class, for example, now uses quad batches directly). In those cases, it's definitely the fastest option, though. You won't find a more efficient way to render objects in Starling.
TextFields support two different kinds of fonts: True Type fonts and Bitmap Fonts.
TrueType fonts are easiest to use: simply embed the “ttf” file and you're done. For static text fields that contain hundreds of characters, they are a good and fast option. Starling will render the text into a bitmap and display the text just like a texture. For short texts that change repeatedly (e.g. a score display), this is too slow, though.
TextFields that use a Bitmap Font can be created and updated very fast. Another advantage is that they don't take up any additional texture memory except what's needed for the original texture. That makes them the preferred way of displaying text in Starling. My recommendation is to use them whenever possible.
One more thing needs to be mentioned: per default, a TextField will require one draw call, even if your glyph texture is part of your main texture atlas. That's because long texts require a lot of CPU time to batch, making the additional draw call worth the effort.
However, if your text field contains only a few letters (rule of thumb: below 16), you can enable the batchable property on the TextField. With that enabled, your texts will be batched just like other display objects.
Mipmaps are downsampled versions of your textures, intended to increase rendering speed and reduce aliasing effects.
Per default, Starling creates them for you when you load a texture (through the Texture.fromBitmap[Data] methods). In most cases, this is a good thing: it will increase rendering speed when the texture is scaled down, and you avoid ugly aliasing effects.
In some cases, however, it makes sense to disable mipmaps:
TextureSmoothing.TRILINEAR on those objects.)
In a nutshell, disabling mipmaps will make your game faster when loading texture; but slowed down rendering speed when the texture is scaled down. To disable mipmapping, use the corresponding parameter in the Texture.fromBitmap[Data] methods.
The easiest way to include a texture in a game is to use the classic “[Embed]” syntax. Unfortunately, this approach wastes a lot of memory!
That's because the texture will be in memory twice: once as the embedded class that the runtime created for you, and once as the actual Starling texture.
To avoid this, do not embed your textures in the source, but instead load them from an URL or a file. Starling's AssetManager class makes this very easy. This is especially important on mobile devices, where memory is always a limiting factor.
If you've got totally opaque, rectangular textures, help the GPU by disabling blending for those textures. This is especially useful for large background images. Don't be afraid of the additional state change this will cause; it's worth it!
backgroundImage.blendMode = BlendMode.NONE;
If the background of your game is a flat color, set the stage color to that value instead of adding a texture or a colored quad. Starling has to clear the stage once per frame, anyway — thus, if you change the stage color, that operation won't cost anything. There is such a thing as a free lunch, after all!
[SWF(backgroundColor="#ff2255")] public class Startup extends Sprite { // ... }
The width and height properties are more expensive than one would guess intuitively, especially on sprites (a matrix has to be calculated, then each vertex of each child will be multiplied with that matrix).
For that reason, avoid accessing them repeatedly, e.g. in a loop. In some cases, it might even make sense to use a constant value instead.
// bad: for (var i:int=0; i<numChildren; ++i) { var child:DisplayObject = getChildAt(i); if (child.x > wall.width) child.removeFromParent(); } // better: var wallWidth:Number = wall.width; for (var i:int=0; i<numChildren; ++i) { var child:DisplayObject = getChildAt(i); if (child.x > wallWidth) child.removeFromParent(); }
When you move the mouse/finger over the screen, Starling has to find out which object is hit. This can be an expensive operation, because it has to iterate over all of your display objects and call their hitTest method.
Thus, it helps to make objects “untouchable” if you don't care about them being touched, anyway. It's best to disable touches on complete containers: that way, Starling won't even have to iterate over their children.
// good: for (var i:int=0; i<container.numChildren; ++i) container.getChildAt(i).touchable = false; // even better: container.touchable = false;
Starling will send to the GPU any object that is part of the display list. This is true even for objects that are outside the stage bounds!
Now, why doesn't Starling simply ignore those invisible objects? The reason is that checking the visibility in a universal way is quite expensive. So expensive, in fact, that it's faster to send it up to the GPU and let it do to the clipping. The GPU is actually very efficient with that, and will abort the whole rendering pipeline very early if the object is outside the screen bounds.
However, it still takes time to upload that data, and you can avoid that. Within the high level game logic, it's often easier to make visibility checks (you can e.g. just check if the x/y coordinates are within the stage bounds (plus a small extra belt). If you've got lots of objects that are outside those bounds, it's worth the effort. Remove those elements from the stage or set their visible property to false.
Beginning with Starling 1.2, there are new methods for event dispatching:
// classic way: object.dispatchEvent(new Event("type", bubbles)); // new way: object.dispatchEventWith("type", bubbles);
The new approach will dispatch an event object just like the first one, but behind the scenes, it will pool event objects for you. That means that you will save the Garbage Collector some work if you use the second technique. So it's less code to write and is faster — thus, it's the preferred way to dispatch events now. (Except if you've got a custom subclass of Event; those cannot be dispatched with that method.)
Next section: Custom Display Objects