Logo (2 kids swinging on a swing)

60 FPS

There is a limit to what you can do in 16 ⅔ milliseconds. If it takes you 17 milliseconds to calculate a frame then you've failed and are only running at 58 FPS.

So what kinds of things end up wasting cpu time?

What can you do to speed up your code?

The information below will vary based on the hardware used but hopefully it can be of help.

I've updated the times shown to include times from python 3.5 and 3.6. You will notice some definite speed improvements in 3.6.

If you take a look at this profiling script you will see that each class gets the same variables assigned to it. The only difference is how many functions get called in the process.

3.6 - 0.7768
3.5 - 1.1580
class Class1: def __init__(self): self.a = 1 self.b = 0 self.c = None self.z = 2 for x in range(1000000): a = Class1()
3.6 - 1.1711
3.5 - 1.5977
class Class2: def __init__(self): self.a = 1 self.call1() self.c = None self.z = 2 def call1(self): self.b = 0 for x in range(1000000): a = Class2()
3.6 - 1.5515
3.5 - 1.9694
class Class3: def __init__(self): self.a = 1 self.call1() self.c = None self.call2() def call1(self): self.b = 0 def call2(self): self.z = 2 for x in range(1000000): a = Class3()
3.6 - 1.8414
3.5 - 2.3546
class Class4: def __init__(self): self.call0() self.call1() self.c = None self.call2() def call0(self): self.a = 1 def call1(self): self.b = 0 def call2(self): self.z = 2 for x in range(1000000): a = Class4()

In the table you can see that the time to call a million loops takes longer for each additional function called. It is roughly at max around 0.4 seconds longer each time. 0.4 seconds in a million loops means it takes about 0.0004 milliseconds per function call.

This means that you could do about 40,000 function calls in a single frame, but it also means that if you did you'd be able to do nothing else.

So one way to cut down on wasted time is to not create and call extra functions for no reason. I'm not saying write all of your code in as few functions as possible, just realize that each function takes time to call and be smart about it.

class Example2: def __init__(self): self.reseta() self.resetb() self.resetc() self.resetz() def reset(self): self.reseta() self.resetb() self.resetc() self.resetz() def reseta(self): self.a = 1 def resetb(self): self.b = 0 def resetc(self): self.c = none def resetz(self): self.z = 2 def do(self): # Do stuff # Then Reset self.reset()
class Example1: def __init__(self): self.a = 1 self.b = 0 self.c = None self.z = 2 def do(self): # Do stuff # Then Reset self.a = 1 self.b = 0 self.c = None self.z = 2

If you create an instance of Example1 and call the do function you end up calling 2 functions.

If you create an instance of Example2 and call the do function you end up calling 11 functions. You've roughly wasted 0.0036 milliseconds for the additional function calls which may not seem like much, but it all adds up.

While creating a profile to test in python 3 if joining lists was faster than string concatenation I stumbled across something I was not expecting. So I created this profile to show what I found.

a = [A('Test'), A('Test2')]

3.6 - 0.3049
for i in range(300000): x = '' for y in a: x += y.__str__()
3.6 - 0.4206
for i in range(300000): x = '' for y in a: x += str(y)

b = [A('Test'), A('Test2'), A('Test3'), A('Test4')]

3.6 - 0.5558
for i in range(300000): x = '' for y in b: x += y.__str__()
3.6 - 0.8012
for i in range(300000): x = '' for y in b: x += str(y)

c = [A('Test'), A('Test2'), A('Test3'), A('Test4'), A('Test5'), A('Test6')]

3.6 - 0.8212
for i in range(300000): x = '' for y in c: x += y.__str__()
3.6 - 1.1711
for i in range(300000): x = '' for y in c: x += str(y)

d = [A('Test'), A('Test2'), A('Test3'), A('Test4'), A('Test5'), A('Test6'), A('Test7'), A('Test8')]

3.6 - 1.1039
for i in range(300000): x = '' for y in d: x += y.__str__()
3.6 - 1.5024
for i in range(300000): x = '' for y in d: x += str(y)

e = [A('Test'), A('Test2'), A('Test3'), A('Test4'), A('Test5'), A('Test6'), A('Test7'), A('Test8'), A('Test9'), A('Test10')]

3.6 - 1.3576
for i in range(300000): x = '' for y in e: x += y.__str__()
3.6 - 1.9241
for i in range(300000): x = '' for y in e: x += str(y)

Now if you look at the results above you will see that calling the builtin function str(object) is actually slower than calling object.__str__() on an object. I never expected there to be a big difference but the larger the list the larger the difference.

Now I'm not saying change all your code to call the __str__ function because it doesn't look as nice. But, if you have time critical code it makes sense to do things the faster way.

The results of the repr profile script will give you similar results.

a = None

3.6 - 1.1987
3.5 - 1.2191
a = None for i in range(20000000): if a == None: pass
3.6 - 1.0435
3.5 - 0.9911
a = None for i in range(20000000): if a is None: pass
3.6 - 1.1551
3.5 - 1.2120
a = None for i in range(20000000): if a != None: pass
3.6 - 0.9690
3.5 - 0.9312
a = None for i in range(20000000): if a is not None: pass

b = 'Test'

3.6 - 1.2187
3.5 - 1.2708
b = 'Test' for i in range(20000000): if b == None: pass
3.6 - 0.9615
3.5 - 0.9399
b = 'Test' for i in range(20000000): if b is None: pass
3.6 - 1.3418
3.5 - 1.3807
b = 'Test' for i in range(20000000): if b != None: pass
3.6 - 1.0072
3.5 - 1.0024
b = 'Test' for i in range(20000000): if b is not None: pass

Both ways pass and fail tests the same. So you should always use is. Is is best used to compare python types like str or int.

If you create 2 strings with the same text is will say they are the same but that is because python is assigning both variables to the same string in the underlying c code. You can verify this by using the id function on each variable. If they return the same number then they are the same object.

WARNING: Multiple assignment should only be used with immutable objects.

a = None

3.6 - 1.2433
3.5 - 1.2846
a = None for i in range(20000000): x = a y = a
3.6 - 1.7525
3.5 - 1.6735
a = None for i in range(20000000): x = a y = a z = a
3.6 - 1.2163
3.5 - 0.9998
a = None for i in range(20000000): x = y = a
3.6 - 1.2219
3.5 - 1.1333
a = None for i in range(20000000): x = y = z = a

b = 0

3.6 - 1.3369
3.5 - 1.2594
b = 0 for i in range(20000000): x = b y = b
3.6 - 1.8503
3.5 - 1.8537
b = 0 for i in range(20000000): x = b y = b z = b
3.6 - 1.3549
3.5 - 1.0023 seconds
b = 0 for i in range(20000000): x = y = b
3.6 - 1.3015
3.5 - 1.2631
b = 0 for i in range(20000000): x = y = z = b

c = {}

3.6 - 1.7085
3.5 - 1.4074
c = {} for i in range(20000000): x = c y = c
3.6 - 2.3536
3.5 - 1.7338
c = {} for i in range(20000000): x = c y = c z = c
3.6 - 1.2342
3.5 - 1.0480
c = {} for i in range(20000000): x = y = c
3.6 - 1.5474
3.5 - 1.1935
c = {} for i in range(20000000): x = y = z = c

Assigning something to multiple variables at the same time is fast but you need to make sure the object you are assigning is immutable.

Immutable objects like None, integers, or tuples do not allow the modification of the object in place. Changes to these variables will cause a new object to be created so changing one variable will never change another.

Mutable objects like a dictionary or list allow you to modify the object in place. This makes it appear like changes to one variable cause changes to the other variables but in fact both variables point to the same object.

On a side note it appears that dictionary creation is a bit slower in python 3.6 (compared to 3.5) but from other profiling it appears that using dictionaries is faster. Since most object creation in a game will occur during the loading phase the faster usage of dictionaries is definitely worth them being a bit slower to create.

Empty List or Tuple

3.6 - 4.3505
3.5 - 4.5650
for i in range(20000000): x = list(())
3.6 - 3.3089
3.5 - 4.3342
for i in range(20000000): x = tuple([])
3.6 - 0.6919
3.5 - 0.7779
for i in range(20000000): x = ()
3.6 - 0.8845
3.5 - 1.0145
for i in range(20000000): x = []

List or Tuple Containing '1'

3.6 - 4.5073
3.5 - 8.7496
for i in range(20000000): x = list(('1', ))
3.6 - 4.5427
3.5 - 8.7719
for i in range(20000000): x = tuple(['1'])
3.6 - 0.6661
3.5 - 0.6384
for i in range(20000000): x = ('1', )
3.6 - 1.2698
3.5 - 3.4372
for i in range(20000000): x = ['1']

List or Tuple Containing 1, 2

3.6 - 5.2251
3.5 - 8.5801
for i in range(20000000): x = list((1, 2))
3.6 - 5.1284
3.5 - 9.0448
for i in range(20000000): x = tuple([1, 2])
3.6 - 0.6754
3.5 - 0.6529
for i in range(20000000): x = (1, 2)
3.6 - 1.4492
3.5 - 3.7368
for i in range(20000000): x = [1, 2]

Python provides lists and tuples as sequence types. Lists are mutable allowing you to change their contents after being created. Tuples are immutable and are faster to create as you can see in the table above.

As you can see in the table above it is a lot faster to create a tuple than a list. You can also see that converting between a tuple and a list takes a fair amount of time. So create a list or tuple depending on what you are going to do with it. In python 3.6 converting appears to be much faster than in 3.5, but it is still slower than creating and using the correct object.

Creating code that turns a tuple into a list so it can modify something and then convert it back to a tuple just wastes cpu time.

3.6 - 1.9105
3.5 - 1.9969
class NotStaticA: def test(self, a): return a for i in range(5000000): x = NotStaticA().test(i)
3.6 - 1.8526
3.5 - 1.8878
class StaticA: @staticmethod def test(a): return a for i in range(5000000): x = StaticA().test(i)

As you can see a static method runs faster than a non static method. So if you don't use self inside a function make it a static method and get rid of self.

Loops can be a great place where cpu time can get wasted. Listed here are a number of ways that can make a loop faster. These suggestions make a big difference if the loop is running a lot. If the loop is running 2 or 3 times then you will not see much of a difference. If the loop is only running once, why is there a loop?

The fastest loop is a loop that does nothing.

3.6 - 0.2662
3.5 - 0.3576
for z in range(1000000): pass

If you want your loop to be as fast as possible try doing the following:

  1. Never create an object inside a loop that can get created above it.
    3.6 - 2.3023
    3.5 - 2.1863
    y = () for z in range(10000000): x = (list, tuple) if type(y) in x: pass
    3.6 - 1.3294
    3.5 - 1.4352
    y = () x = (list, tuple) for z in range(1000000): if type(y) in x: pass
  2. If you need to access an objects attributes inside the loop and the value in the attribute never changes then assign it to a variable before the loop.
    3.6 - 0.5950
    3.5 - 0.6395
    class A: def __init__(self): x = 1 y = A() for z in range(10000000): if y.x: pass
    3.6 - 0.3557
    3.5 - 0.3735
    class A: def __init__(self): x = 1 y = A() x = y.x for z in range(10000000): if x: pass
  3. If you need to call the function of an object inside the loop and the object never changes then assign the function (not called) to a variable before the loop. It takes time to lookup a function on an object.
    3.6 - 2.3600
    3.5 - 2.6545
    x = [] for z in range(1000000): x.append(z)
    3.6 - 2.1275
    3.5 - 2.0179
    x = [] y = x.append for z in range(1000000): y(z)
  4. If you need to access an objects attributes inside the loop and the value in the attribute does change but the object is always the same then assign the object to a variable before the loop.
    3.6 - 5.1472
    3.5 - 5.5933
    class A: def __init__(self): self.value = 0 @property def x(self): value = self.value self.value += 1 return value class B: def __init__(self, y): self.y = y b = B(A()) for z in range(1000000): if b.y.x: pass
    3.6 - 4.4075
    3.5 - 4.8776
    class A: def __init__(self): self.value = 0 @property def x(self): value = self.value self.value += 1 return value class B: def __init__(self, y): self.y = y b = B(A()) a = b.y for z in range(1000000): if a.x: pass