Logo (2 kids swinging on a swing)

Python 3

When I was starting to work on the code for my game I found that if I drew a couple hundred small images in panda3d on my parents computer the fps dropped considerably. Their computer is slow but then I tested it on my computer and I encountered the issue as well. So instead of having small images (top left corner, top middle, top right corner, ...) I thought I could get over the FPS drop if I made a single image for each gui element. This would mean I would have several large images instead of hundreds of small images. Since I wanted to make my gui customizable I needed a way to parse pngs at runtime so I could generate the images from a template.

PIL is a really big package which handles lots of images formats. I only wanted to support png because it provides everything I need, so I decided I wouldn't use PIL. I looked at pypng and after having looked at the code I didn't want to use it because of all the module level functions and the code overall just looked messy.

So I looked up the png spec and started writing my own module. Once I was done I profiled my code vs pypng and found that my code was a tiny bit slower for some pngs and a bit faster for others. This spawned me profiling lots of different things because I wanted to try and make my code faster. I made a bunch of changes and my png module became faster than pypng, fast enough to create larger during runtime.

This sparked me to profile a lot of things and my profiling folder currently contains 94 scripts. The results from profiling have influenced how I code, trying to use the faster ways I have discovered.

All profiling results are based on my 8 core AMD CPU. Results will of course vary by CPU speed and type but from what I've seen the faster methods seem to be faster on other computers as well. Each script below is available for download as well as a link to python docs related to aspects of the test. I've updated the times shown to include times from python 3.5 and 3.6. You will notice some definite speed improvements in 3.6. The new modulo test has only been run against python 3.6.

3.6 - 0.9688
3.5 - 1.0825
class Test1: def __init__(self, var1, var2=None): pass for i in range(2000000): x = Test1('test')
3.6 - 1.0558
3.5 - 1.2040
class Test2: def __init__(self, var1, *, var2=None): pass for i in range(2000000): x = Test2('test')
3.6 - 1.4016
3.5 - 1.9800
class Test1: def __init__(self, var1, var2=None): pass for i in range(2000000): x = Test1('test', var2=1)
3.6 - 1.3618
3.5 - 2.0573
class Test2: def __init__(self, var1, *, var2=None): pass for i in range(2000000): x = Test2('test', var2=1)
3.6 - 1.0711
3.5 - 1.1737
class Test3: def __init__(self, var1, var2, var3, var4=None, var5=None, var6=None): pass for i in range(2000000): x = Test3('test', 1, True)
3.6 - 1.1896
3.5 - 1.2767
class Test4: def __init__(self, var1, var2, var3, *, var4=None, var5=None, var6=None): pass for i in range(2000000): x = Test4('test', 1, True)
3.6 - 1.6682
3.5 - 2.3280
class Test3: def __init__(self, var1, var2, var3, var4=None, var5=None, var6=None): pass for i in range(2000000): x = Test3('test', 1, True, var4='Test', var5=0, var6=False)
3.6 - 1.7289
3.5 - 2.4901
class Test4: def __init__(self, var1, var2, var3, *, var4=None, var5=None, var6=None): pass for i in range(2000000): x = Test4('test', 1, True, var4='Test', var5=0, var6=False)

Having a bare * so that parameters are positional only and keyword only is a great idea. It is a great way to prevent bugs if keyword arguments get moved.

From the code above it seems to slow things down just by having the star parameter. It is really unfortunate and despite the slow down it may still be a good idea to use it during development at least. If your code ends up being too slow you can always remove it.

2 Subclasses (1 - A, 1 - B)

3.6 - 2.4469
3.5 - 3.0987
class SuperA: def __init__(self): super().__init__() self.a = None class SuperB: def __init__(self): super().__init__() self.b = 0 class SuperD(SuperA, SuperB): def __init__(self): super().__init__() self.d = None for i in range(1000000): x = SuperD()
3.6 - 1.9557
3.5 - 2.5350
class NotSuperA: def __init__(self): object.__init__( self ) self.a = None class NotSuperB: def __init__(self): object.__init__( self ) self.b = 0 class NotSuperD(NotSuperA, NotSuperB): def __init__(self): NotSuperA.__init__( self ) NotSuperB.__init__( self ) self.d = None for i in range(1000000): x = NotSuperD()

2 Subclasses (1 - B, 1 - C)

3.6 - 2.5691
3.5 - 3.3851
class SuperB: def __init__(self): super().__init__() self.b = 0 class SuperC: def __init__(self): super().__init__() self.c = {} class SuperE(SuperB, SuperC): def __init__(self): super().__init__() self.e = 1 for i in range(1000000): x = SuperE()
3.6 - 2.0490
3.5 - 2.8063
class NotSuperB: def __init__(self): object.__init__( self ) self.b = 0 class NotSuperC: def __init__(self): object.__init__( self ) self.c = {} class NotSuperE(NotSuperB, NotSuperC): def __init__(self): NotSuperC.__init__( self ) NotSuperB.__init__( self ) self.e = 1 for i in range(1000000): x = NotSuperE()

4 Subclasses (1 - A, 1 - B, 1 - C, 1 - D)

3.6 - 4.1499
3.5 - 5.0575
class SuperA: def __init__(self): super().__init__() self.a = None class SuperB: def __init__(self): super().__init__() self.b = 0 class SuperC: def __init__(self): super().__init__() self.c = {} class SuperD(SuperA, SuperB): def __init__(self): super().__init__() self.d = None class SuperF(SuperD, SuperC): def __init__(self): super().__init__() self.f = 'Test' for i in range(1000000): x = SuperF()
3.6 - 3.3413
3.5 - 4.3231
class NotSuperA: def __init__(self): object.__init__( self ) self.a = None class NotSuperB: def __init__(self): object.__init__( self ) self.b = 0 class NotSuperC: def __init__(self): object.__init__( self ) self.c = {} class NotSuperD(NotSuperA, NotSuperB): def __init__(self): NotSuperA.__init__( self ) NotSuperB.__init__( self ) self.d = None class NotSuperF(NotSuperD, NotSuperC): def __init__(self): NotSuperD.__init__( self ) NotSuperC.__init__( self ) self.f = 'Test' for i in range(1000000): x = NotSuperF()

8 Subclasses (1 - A, 2 - B, 2 - C, 1 - D, 1 - E, 1 - F)

3.6 - 5.6171
3.5 - 6.8697
class SuperA: def __init__(self): super().__init__() self.a = None class SuperB: def __init__(self): super().__init__() self.b = 0 class SuperC: def __init__(self): super().__init__() self.c = {} class SuperD(SuperA, SuperB): def __init__(self): super().__init__() self.d = None class SuperE(SuperB, SuperC): def __init__(self): super().__init__() self.e = 1 class SuperF(SuperD, SuperC): def __init__(self): super().__init__() self.f = 'Test' class SuperG(SuperE, SuperF): def __init__(self): super().__init__() self.g = [] for i in range(1000000): x = SuperG()
3.6 - 5.740
3.5 - 7.2918
class NotSuperA: def __init__(self): object.__init__( self ) self.a = None class NotSuperB: def __init__(self): object.__init__( self ) self.b = 0 class NotSuperC: def __init__(self): object.__init__( self ) self.c = {} class NotSuperD(NotSuperA, NotSuperB): def __init__(self): NotSuperA.__init__( self ) NotSuperB.__init__( self ) self.d = None class NotSuperE(NotSuperB, NotSuperC): def __init__(self): NotSuperC.__init__( self ) NotSuperB.__init__( self ) self.e = 1 class NotSuperF(NotSuperD, NotSuperC): def __init__(self): NotSuperD.__init__( self ) NotSuperC.__init__( self ) self.f = 'Test' class NotSuperG(NotSuperE, NotSuperF): def __init__(self): NotSuperE.__init__( self ) NotSuperF.__init__( self ) self.g = [] for i in range(1000000): x = NotSuperG()

If you take a look at the table above you will see that super is slower than not using super except for class G. With simple sub classes super is indeed slower.

Now if you take a look you will see that when we reach class G we have eight sub classes.

With structures where you end up having multiple of the same sub class super only calls these sub classes once. If you are calling the subclasses methods instead of using super these sub classes end up getting called multiple times. Calling a sub classes method more than once could lead to bugs and is slower than calling it once. So even though super is slower for simple class structures it is the better method of doing things to prevent bugs.

Comparing objects is a very big topic with many advantages and disadvantages depending on what you need to do and how you do it. Take a look at the wikipedia duck typing link. The last paragraph under the python heading (Or, a more common use ...) is a paragraph that I completely agree with.

Comparing objects in any form takes time. When it comes to a game engine where someone wants to get 60 fps I personally think that the code should try to avoid comparisons simply because they slow things down. I'll explain what I mean and provide a good alternative.

Lets take the following code which is a simplified and generalized version of a function within panda3d.

class Example1: def func(self, func_or_task): if isinstance( func_or_task, Task ): task = \ func_or_task elif callable( func_or_task ): task = \ PythonTask( func_or_task ) else: logAnError( 'Error ' + 'Message' ) if hasattr(task, 'attr'): do_something

Firstly my impression of this code is that it has some issues. The first line uses isinstance which is slow. To speed it up if someone is passing in a function I'd do the callable test first so that isinstance is only used if needed. If func_or_task fails and goes to the else side where an error gets logged then the next line of code will actually blow up because the task variable doesn't exist. The function is written expecting the programmer to just pass it any old object which doesn't give the programmer the benefit of the doubt.

I would make the following changes.

class Example2: def func1(self, task): if hasattr(task, 'attr'): do_something def func2(self, func): task = PythonTask( func ) if hasattr(task, 'attr'): do_something

Now you have 2 functions where each function takes in only one type of object. This makes the api a bit more complex because you need to know which function to call, but the functions can be appropriately named to make things easier. The original function was going to blow up if the programmer passed in the wrong type of object so just let it. This makes the code run faster when someone is doing things correctly which is what is desired in a game engine in the first place. The programmer now takes control of comparing objects if they actually need to.

a = {}

3.6 - 1.5859
3.5 - 1.6503
a = {} for i in range(3000000): if hasattr(a, 'keys'): a.keys() else: pass
3.6 - 1.8067
3.5 - 2.2317
a = {} for i in range(3000000): if hasattr( a, 'startswith' ): pass else: pass

3.6 - 0.7331
3.5 - 0.8021
a = {} for i in range(3000000): try: a.keys() except AttributeError: pass
3.6 - 1.9348
3.5 - 2.5920
a = {} for i in range(3000000): try: a.startswith('Test') except AttributeError: pass

3.6 - 1.5509
3.5 - 1.5525
a = {} for i in range(3000000): if isinstance(a, dict): a.keys() else: pass
3.6 - 0.9438
3.5 - 0.9707
a = {} for i in range(3000000): if isinstance(a, str): pass else: pass

3.6 - 1.2148
3.5 - 1.2782
a = {} for i in range(3000000): if type(a) is dict: a.keys() else: pass
3.6 - 0.4371
3.5 - 0.4293
a = {} for i in range(3000000): if type(a) is str: pass else: pass

b = 0

3.6 - 0.9350
3.5 - 0.9292
b = 0 for i in range(3000000): if hasattr(b, 'real'): b.real else: pass
3.6 - 1.8891
3.5 - 2.2702
b = 0 for i in range(3000000): if hasattr( b, 'startswith' ): pass else: pass

3.6 - 0.2082
3.5 - 0.2374
b = 0 for i in range(3000000): try: b.real except AttributeError: pass
3.6 - 2.0158
3.5 - 2.5719
b = 0 for i in range(3000000): try: x = b.startswith( 'Test' ) except AttributeError: pass

3.6 - 1.0285
3.5 - 0.9526
b = 0 for i in range(3000000): if isinstance(b, int): b.real else: pass
3.6 - 0.9335
3.5 - 0.9396
b = 0 for i in range(3000000): if isinstance(b, str): pass else: pass

3.6 - 0.6803
3.5 - 0.6648
b = 0 for i in range(3000000): if type(b) is int: b.real else: pass
3.6 - 0.4962
3.5 - 0.4356
b = 0 for i in range(3000000): if type(b) is str: pass else: pass

c = 'Testing'

3.6 - 1.9579
3.5 - 2.0039
c = 'Testing' for i in range(3000000): if hasattr( c, 'startswith' ): x = c.startswith( 'Test' ) else: pass
3.6 - 1.8569
3.5 - 2.2129
c = 'Testing' for i in range(3000000): if hasattr(c, 'keys'): pass else: pass

3.6 - 1.0231
3.5 - 1.1315
c = 'Testing' for i in range(3000000): try: x = c.startswith( 'Test' ) except AttributeError: pass
3.6 - 1.9852
3.5 - 2.5461
c = 'Testing' for i in range(3000000): try: x = c.keys() except AttributeError: pass

3.6 - 1.8693
3.5 - 1.9501
c = 'Testing' for i in range(3000000): if isinstance(c, str): x = c.startswith( 'Test' ) else: pass
3.6 - 0.9801
3.5 - 0.9164
c = 'Testing' for i in range(3000000): if isinstance(c, dict): pass else: pass

3.6 - 1.5584
3.5 - 1.5458
c = 'Testing' for i in range(3000000): if type(c) is str: x = c.startswith( 'Test' ) else: pass
3.6 - 0.4846
3.5 - 0.4107
c = 'Testing' for i in range(3000000): if type(c) is dict: pass else: pass

Take a look at the results in the first table where everything passes the tests and exceptions do not get raised. You will notice that the fastest time is of course when you just run the code without comparing objects. So the fastest code you could write is the code that does what you want.

Now take a look at the results in the second table where everything fails the tests and exceptions are raised because the object is not what is expected. You will notice that the slowest time is where an exception was raised and it was caught. This is because exceptions contain a bunch of data which needs to get generated when an exception occurs.

If you need two code paths within a function then you need a test to provide you this. As you can see in the results the type test is the fastest compared to the successful try/except test. This is followed by isinstance and hasattr which are nearly the same speed except when hasattr fails. Because hasattr uses a try/except calling getattr it ends up being the slowest test when it fails. So comparing objects with type or isinstance is probably the best solution if you need to.

Personally I think my solution above is the better way to design classes.

a = 'Test'

3.6 - 0.7017
3.5 - 0.7420
a = 'Test' for i in range(2000000): x = '%s' % repr(a)
3.6 - 0.3738
3.5 - 0.3522
a = 'Test' for i in range(2000000): x = '%r' % a

b = {'Test': 1}

3.6 - 1.6903
3.5 - 2.2960
b = {'Test': 1} for i in range(2000000): x = '%s' % repr(b)
3.6 - 1.1823
3.5 - 1.8091
b = {'Test': 1} for i in range(2000000): x = '%r' % b

c = 1

3.6 - 0.6888
3.5 - 0.7536
c = 1 for i in range(2000000): x = '%s' % repr(c)
3.6 - 0.4069
3.5 - 0.4310
c = 1 for i in range(2000000): x = '%r' % c

d = 0.25

3.6 - 1.6180
3.5 - 2.1191
d = 0.25 for i in range(2000000): x = '%s' % repr(d)
3.6 - 1.0616
3.5 - 1.5988
d = 0.25 for i in range(2000000): x = '%r' % d

e = Decimal('Infinity')

3.6 - 1.2678
3.5 - 1.8686
e = Decimal('Infinity') for i in range(2000000): x = '%s' % repr(e)
3.6 - 0.8707
3.5 - 1.4408
e = Decimal('Infinity') for i in range(2000000): x = '%r' % e

Calling repr outside of the c-style string replacement causes an additional function to run which is why it is slower. You should always use %r if you want a repr.

a = 'Test'

3.6 - 0.5132
3.5 - 0.5895
a = 'Test' for i in range(2000000): x = '%s' % str(a)
3.6 - 0.2194
3.5 - 0.2405
a = 'Test' for i in range(2000000): x = '%s' % a

b = {'Test': 1}

3.6 - 1.5349
3.5 - 2.4255
b = {'Test': 1} for i in range(2000000): x = '%s' % str(b)
3.6 - 1.1225
3.5 - 1.8897
b = {'Test': 1} for i in range(2000000): x = '%s' % b

c = 1

3.6 - 0.6403
3.5 - 0.7939
c = 1 for i in range(2000000): x = '%s' % str(c)
3.6 - 0.3831
3.5 - 0.4583
c = 1 for i in range(2000000): x = '%s' % c

d = 0.25

3.6 - 1.4561
3.5 - 1.9764
d = 0.25 for i in range(2000000): x = '%s' % str(d)
3.6 - 1.0491
3.5 - 1.5127
d = 0.25 for i in range(2000000): x = '%s' % d

e = Decimal('Infinity')

3.6 - 0.6679
3.5 - 1.2021
e = Decimal('Infinity') for i in range(2000000): x = '%s' % str(e)
3.6 - 0.3461
3.5 - 0.7679
e = Decimal('Infinity') for i in range(2000000): x = '%s' % e

Calling str outside of the c-style string replacement causes an additional function to run which is why it is slower. You should always use %s if you want a string.

Callable Object

3.6 - 1.3813
3.5 - 1.4271
class Success: def __call__(self): pass a = Success() for x in range(5000000): if hasattr(a, '__call__'): pass
3.6 - 0.9778
3.5 - 0.9250
class Success: def __call__(self): pass a = Success() for x in range(5000000): if callable(a): pass

Not Callable Object

3.6 - 3.1771
3.5 - 3.0753
class Failure: pass b = Failure() for x in range(5000000): if hasattr(b, '__call__'): pass
3.6 - 0.9702
3.5 - 0.9424
class Failure: pass b = Failure() for x in range(5000000): if callable(b): pass

Both ways pass and fail tests the same.

hasattr calls getattr to see if it raises AttributeError. callable does things in a different way and doesn't have the raising/catching errors overhead.

WARNING: Fromkeys should only be used with immutable defaults.

Default Value = 0

3.6 - 0.3065
3.5 - 0.2987
a = {} for i in range(2000000): a[i] = 0
3.6 - 0.3848
3.5 - 0.4067
a = {} for i in range(2000000): if i not in a: a[i] = 0
3.6 - 0.6707
3.5 - 0.6310
b = defaultdict(int) for i in range(2000000): b[i]
3.6 - 0.7073
3.5 - 0.7407
a = {} for i in range(2000000): a.setdefault(i, 0)
3.6 - 0.2523
3.5 - 0.2548
a = {}.fromkeys( range(2000000), 0 )
3.6 - 0.3015
3.5 - 0.2863
a = { i: 0 for i in range( 2000000 ) }
3.6 - 0.6824
3.5 - 0.6612
a = dict( (i, 0) for i in range( 2000000 ) )

Default Value = 1

3.6 - 0.3207
3.5 - 0.3161
a = {} for i in range(2000000): a[i] = 1
3.6 - 0.3889
3.5 - 0.3756
a = {} for i in range(2000000): if i not in a: a[i] = 1
3.6 - 0.9670
3.5 - 1.0192
c = defaultdict(lambda: 1) for i in range(2000000): c[i]
3.6 - 0.7136
3.5 - 0.7089
a = {} for i in range(2000000): a.setdefault(i, 1)
3.6 - 0.2652
3.5 - 0.2597
a = {}.fromkeys( range(2000000), 1 )
3.6 - 0.3026
3.5 - 0.3098
a = { i: 1 for i in range( 2000000 ) }
3.6 - 0.7107
3.5 - 0.6986
a = dict( (i, 1) for i in range( 2000000 ) )

Default Value = {}

3.6 - 0.6460
3.5 - 0.9237
a = {} for i in range(2000000): a[i] = {}
3.6 - 0.7354
3.5 - 1.0297
a = {} for i in range(2000000): if i not in a: a[i] = {}
3.6 - 0.9730
3.5 - 1.3183
d = defaultdict(dict) for i in range(2000000): d[i]
3.6 - 1.0888
3.5 - 1.4463
a = {} for i in range(2000000): a.setdefault(i, {})
3.6 - 0.2531
3.5 - 0.2545
a = {}.fromkeys( range(2000000), {} )
3.6 - 0.6238 seconds
3.5 - 0.9500
a = { i: {} for i in range( 2000000 ) }
3.6 - 1.1287
3.5 - 1.4195
a = dict( (i, {}) for i in range( 2000000 ) )

There are so many ways to create a dictionary with a default value for each key. Some ways look cool but most of them are not as fast as just doing the simple a[i] = ? and if needed with a test to see if the value is in a. It is not a good reason to use some code just because it looks cool.

Fromkeys is the fastest way to create a dictionary with default values, but you must never use a mutable default value. This is because it only creates a single default object and assigns it to each of the keys. It is the same issue described in Multiple vs. Single Assignment.

Division and divmod[0]

3.6 - 2.4997
for i in range(10000000): x = int(i / 16)
3.6 - 3.3264
for i in range(10000000): x = divmod(i, 16)[0]

Modulo and divmod[1]

3.6 - 0.6935
for i in range(10000000): x = i % 16
3.6 - 3.4652
for i in range(10000000): x = divmod(i, 16)[1]

Modulo and Division and divmod

3.6 - 2.9782
for i in range(10000000): a = int(i / 16) b = i % 16
3.6 - 3.2694
for i in range(10000000): a, b = divmod(i, 16)
3.6 - 3.7674
for i in range(10000000): x = divmod(i, 16) a = x[0] b = x[1]

If you look at the results above you can see that divmod is not as fast as you would expect. For some reason it is slower in all the tests. It could make for easier to read code, but I would suggest adding a comment and using the faster ways.