Logo (2 kids swinging on a swing)

Python 3

When I was starting to work on the code for my game I found that if I drew a couple hundred small images in panda3d on my parents computer the fps dropped considerably. Their computer is slow but then I tested it on my computer and I encountered the issue as well. So instead of having small images (top left corner, top middle, top right corner, ...) I thought I could get over the FPS drop if I made a single image for each gui element. This would mean I would have several large images instead of hundreds of small images. Since I wanted to make my gui customizable I needed a way to parse pngs at runtime so I could generate the images from a template.

PIL is a really big package which handles lots of images formats. I only wanted to support png because it provides everything that I would need. So I decided I wouldn't want use PIL. I looked at pypng and after having looked at the code I didn't want to use it either because of all the module level functions and the code just looked messy. I wanted something clean, fast, and with classes so I decided I would try writing my own.

So I looked up the png spec and started writing. Once I was done I profiled my code vs pypng and found that my code was a tiny bit slower for some pngs and a bit faster for others. This spawned my profiling code because I wanted to try and make my code faster. I made a bunch of changes and my png module became faster than pypng, fast enough to do it live.

This sparked me to profile a lot of things and my profiling folder currently contains 70 scripts. The results from profiling have influenced how I create code, trying to use the faster methods.

All profiling results are based on my 8 core AMD CPU. Results will of course vary by CPU speed and type but from what I've seen the faster methods seem to be faster on other computers as well. Each script below is available for download as well as a link to python docs related to aspects of the test.

2,000,000 Loops - 1.0825 seconds
class Test1: def __init__(self, var1, var2=None): pass for i in range(2000000): x = Test1('test')
2,000,000 Loops - 1.2040 seconds
class Test2: def __init__(self, var1, *, var2=None): pass for i in range(2000000): x = Test2('test')
2,000,000 Loops - 1.9800 seconds
class Test1: def __init__(self, var1, var2=None): pass for i in range(2000000): x = Test1('test', var2=1)
2,000,000 Loops - 2.0573 seconds
class Test2: def __init__(self, var1, *, var2=None): pass for i in range(2000000): x = Test2('test', var2=1)
2,000,000 Loops - 1.1737 seconds
class Test3: def __init__(self, var1, var2, var3, var4=None, var5=None, var6=None): pass for i in range(2000000): x = Test3('test', 1, True)
2,000,000 Loops - 1.2767 seconds
class Test4: def __init__(self, var1, var2, var3, *, var4=None, var5=None, var6=None): pass for i in range(2000000): x = Test4('test', 1, True)
2,000,000 Loops - 2.3280 seconds
class Test3: def __init__(self, var1, var2, var3, var4=None, var5=None, var6=None): pass for i in range(2000000): x = Test3('test', 1, True, var4='Test', var5=0, var6=False)
2,000,000 Loops - 2.4901 seconds
class Test4: def __init__(self, var1, var2, var3, *, var4=None, var5=None, var6=None): pass for i in range(2000000): x = Test4('test', 1, True, var4='Test', var5=0, var6=False)

Having a bare * so that parameters are positional only and keyword only is a great idea. It is a great way to prevent bugs if keyword arguments get moved.

From the code above it seems to slow things down just by having the star parameter. It is really unfortunate and despite the slow down it may still be a good idea to use it during development at least. If your code ends up being too slow you can always remove it.

2 Subclasses (1 - A, 1 - B)

1,000,000 Loops - 3.0987 seconds
class SuperA: def __init__(self): super().__init__() self.a = None class SuperB: def __init__(self): super().__init__() self.b = 0 class SuperD(SuperA, SuperB): def __init__(self): super().__init__() self.d = None for i in range(1000000): x = SuperD()
1,000,000 Loops - 2.5350 seconds
class NotSuperA: def __init__(self): object.__init__( self ) self.a = None class NotSuperB: def __init__(self): object.__init__( self ) self.b = 0 class NotSuperD(NotSuperA, NotSuperB): def __init__(self): NotSuperA.__init__( self ) NotSuperB.__init__( self ) self.d = None for i in range(1000000): x = NotSuperD()

2 Subclasses (1 - B, 1 - C)

1,000,000 Loops - 3.3851 seconds
class SuperB: def __init__(self): super().__init__() self.b = 0 class SuperC: def __init__(self): super().__init__() self.c = {} class SuperE(SuperB, SuperC): def __init__(self): super().__init__() self.e = 1 for i in range(1000000): x = SuperE()
1,000,000 Loops - 2.8063 seconds
class NotSuperB: def __init__(self): object.__init__( self ) self.b = 0 class NotSuperC: def __init__(self): object.__init__( self ) self.c = {} class NotSuperE(NotSuperB, NotSuperC): def __init__(self): NotSuperC.__init__( self ) NotSuperB.__init__( self ) self.e = 1 for i in range(1000000): x = NotSuperE()

4 Subclasses (1 - A, 1 - B, 1 - C, 1 - D)

1,000,000 Loops - 5.0575 seconds
class SuperA: def __init__(self): super().__init__() self.a = None class SuperB: def __init__(self): super().__init__() self.b = 0 class SuperC: def __init__(self): super().__init__() self.c = {} class SuperD(SuperA, SuperB): def __init__(self): super().__init__() self.d = None class SuperF(SuperD, SuperC): def __init__(self): super().__init__() self.f = 'Test' for i in range(1000000): x = SuperF()
1,000,000 Loops - 4.3231 seconds
class NotSuperA: def __init__(self): object.__init__( self ) self.a = None class NotSuperB: def __init__(self): object.__init__( self ) self.b = 0 class NotSuperC: def __init__(self): object.__init__( self ) self.c = {} class NotSuperD(NotSuperA, NotSuperB): def __init__(self): NotSuperA.__init__( self ) NotSuperB.__init__( self ) self.d = None class NotSuperF(NotSuperD, NotSuperC): def __init__(self): NotSuperD.__init__( self ) NotSuperC.__init__( self ) self.f = 'Test' for i in range(1000000): x = NotSuperF()

8 Subclasses (1 - A, 2 - B, 2 - C, 1 - D, 1 - E, 1 - F)

1,000,000 Loops - 6.8697 seconds
class SuperA: def __init__(self): super().__init__() self.a = None class SuperB: def __init__(self): super().__init__() self.b = 0 class SuperC: def __init__(self): super().__init__() self.c = {} class SuperD(SuperA, SuperB): def __init__(self): super().__init__() self.d = None class SuperE(SuperB, SuperC): def __init__(self): super().__init__() self.e = 1 class SuperF(SuperD, SuperC): def __init__(self): super().__init__() self.f = 'Test' class SuperG(SuperE, SuperF): def __init__(self): super().__init__() self.g = [] for i in range(1000000): x = SuperG()
1,000,000 Loops - 7.2918 seconds
class NotSuperA: def __init__(self): object.__init__( self ) self.a = None class NotSuperB: def __init__(self): object.__init__( self ) self.b = 0 class NotSuperC: def __init__(self): object.__init__( self ) self.c = {} class NotSuperD(NotSuperA, NotSuperB): def __init__(self): NotSuperA.__init__( self ) NotSuperB.__init__( self ) self.d = None class NotSuperE(NotSuperB, NotSuperC): def __init__(self): NotSuperC.__init__( self ) NotSuperB.__init__( self ) self.e = 1 class NotSuperF(NotSuperD, NotSuperC): def __init__(self): NotSuperD.__init__( self ) NotSuperC.__init__( self ) self.f = 'Test' class NotSuperG(NotSuperE, NotSuperF): def __init__(self): NotSuperE.__init__( self ) NotSuperF.__init__( self ) self.g = [] for i in range(1000000): x = NotSuperG()

If you take a look at the table above you will see that super is slower than not using super except for class G. With simple sub classes super is indeed slower.

Now if you take a look you will see that when we reach class G we have eight sub classes.

With structures where you end up having multiple of the same sub class super only calls these sub classes once. If you are calling the subclasses methods instead of using super these sub classes end up getting called multiple times. Calling a sub classes method more than once could lead to bugs and is slower than calling it once. So even though super is slower for simple class structures it is the better method of doing things to prevent bugs.

Comparing objects is a very big topic with many advantages and disadvantages depending on what you need to do and how you do it. Take a look at the wikipedia duck typing link. The last paragraph under the python heading (Or, a more common use ...) is a paragraph that I completely agree with.

Comparing objects in any form takes time. When it comes to a game engine where someone wants to get 60 fps I personally think that the code should try to avoid comparisons simply because they slow things down. I'll explain what I mean and provide a good alternative.

Lets take the following code which is a simplified and generalized version of a function within panda3d.

class Example1: def func(self, func_or_task): if isinstance( func_or_task, Task ): task = \ func_or_task elif callable( func_or_task ): task = \ PythonTask( func_or_task ) else: logAnError( 'Error ' + 'Message' ) if hasattr(task, 'attr'): do_something

Firstly my impression of this code is that it has some issues. The first line uses isinstance which is slow. To speed it up if someone is passing in a function I'd do the callable test first so that isinstance is only used if needed. If func_or_task fails and goes to the else side where an error gets logged then the next line of code will actually blow up because the task variable doesn't exist. The function is written expecting the programmer to just pass it any old object which doesn't give the programmer the benefit of the doubt.

I would make the following changes.

class Example2: def func1(self, task): if hasattr(task, 'attr'): do_something def func2(self, func): task = PythonTask( func ) if hasattr(task, 'attr'): do_something

Now you have 2 functions where each function takes in only one type of object. This makes the api a bit more complex because you need to know which function to call, but the functions can be appropriately named to make things easier. The original function was going to blow up if the programmer passed in the wrong type of object so just let it. This makes the code run faster when someone is doing things correctly which is what is desired in a game engine in the first place. The programmer now takes control of comparing objects if they actually need to.

a = {}

3,000,000 Loops - 1.6503 seconds
a = {} for i in range(3000000): if hasattr(a, 'keys'): a.keys() else: pass
3,000,000 Loops - 2.2317 seconds
a = {} for i in range(3000000): if hasattr( a, 'startswith' ): pass else: pass

3,000,000 Loops - 0.8021 seconds
a = {} for i in range(3000000): try: a.keys() except AttributeError: pass
3,000,000 Loops - 2.5920 seconds
a = {} for i in range(3000000): try: a.startswith('Test') except AttributeError: pass

3,000,000 Loops - 1.5525 seconds
a = {} for i in range(3000000): if isinstance(a, dict): a.keys() else: pass
3,000,000 Loops - 0.9707 seconds
a = {} for i in range(3000000): if isinstance(a, str): pass else: pass

3,000,000 Loops - 1.2782 seconds
a = {} for i in range(3000000): if type(a) is dict: a.keys() else: pass
3,000,000 Loops - 0.4293 seconds
a = {} for i in range(3000000): if type(a) is str: pass else: pass

b = 0

3,000,000 Loops - 0.9292 seconds
b = 0 for i in range(3000000): if hasattr(b, 'real'): b.real else: pass
3,000,000 Loops - 2.2702 seconds
b = 0 for i in range(3000000): if hasattr( b, 'startswith' ): pass else: pass

3,000,000 Loops - 0.2374 seconds
b = 0 for i in range(3000000): try: b.real except AttributeError: pass
3,000,000 Loops - 2.5719 seconds
b = 0 for i in range(3000000): try: x = b.startswith( 'Test' ) except AttributeError: pass

3,000,000 Loops - 0.9526 seconds
b = 0 for i in range(3000000): if isinstance(b, int): b.real else: pass
3,000,000 Loops - 0.9396 seconds
b = 0 for i in range(3000000): if isinstance(b, str): pass else: pass

3,000,000 Loops - 0.6648 seconds
b = 0 for i in range(3000000): if type(b) is int: b.real else: pass
3,000,000 Loops - 0.4356 seconds
b = 0 for i in range(3000000): if type(b) is str: pass else: pass

c = 'Testing'

3,000,000 Loops - 2.0039 seconds
c = 'Testing' for i in range(3000000): if hasattr( c, 'startswith' ): x = c.startswith( 'Test' ) else: pass
3,000,000 Loops - 2.2129 seconds
c = 'Testing' for i in range(3000000): if hasattr(c, 'keys'): pass else: pass

3,000,000 Loops - 1.1315 seconds
c = 'Testing' for i in range(3000000): try: x = c.startswith( 'Test' ) except AttributeError: pass
3,000,000 Loops - 2.5461 seconds
c = 'Testing' for i in range(3000000): try: x = c.keys() except AttributeError: pass

3,000,000 Loops - 1.9501 seconds
c = 'Testing' for i in range(3000000): if isinstance(c, str): x = c.startswith( 'Test' ) else: pass
3,000,000 Loops - 0.9164 seconds
c = 'Testing' for i in range(3000000): if isinstance(c, dict): pass else: pass

3,000,000 Loops - 1.5458 seconds
c = 'Testing' for i in range(3000000): if type(c) is str: x = c.startswith( 'Test' ) else: pass
3,000,000 Loops - 0.4107 seconds
c = 'Testing' for i in range(3000000): if type(c) is dict: pass else: pass

Take a look at the results in the first table where everything passes the tests and exceptions do not get raised. You will notice that the fastest time is of course when you just run the code without comparing objects. So the fastest code you could write is the code that does what you want.

Now take a look at the results in the second table where everything fails the tests and exceptions are raised because the object is not what is expected. You will notice that the slowest time is where an exception was raised and it was caught. This is because exceptions contain a bunch of data which needs to get generated when an exception occurs.

If you need two code paths within a function then you need a test to provide you this. As you can see in the results the type test is the fastest compared to the successful try/except test. This is followed by isinstance and hasattr which are nearly the same speed except when hasattr fails. Because hasattr uses a try/except calling getattr it ends up being the slowest test when it fails. So comparing objects with type or isinstance is probably the best solution if you need to.

Personally I think my solution above is the better way to design classes.

a = 'Test'

2,000,000 Loops - 0.7420 seconds
a = 'Test' for i in range(2000000): x = '%s' % repr(a)
2,000,000 Loops - 0.3522 seconds
a = 'Test' for i in range(2000000): x = '%r' % a

b = {'Test': 1}

2,000,000 Loops - 2.2960 seconds
b = {'Test': 1} for i in range(2000000): x = '%s' % repr(b)
2,000,000 Loops - 1.8091 seconds
b = {'Test': 1} for i in range(2000000): x = '%r' % b

c = 1

2,000,000 Loops - 0.7536 seconds
c = 1 for i in range(2000000): x = '%s' % repr(c)
2,000,000 Loops - 0.4310 seconds
c = 1 for i in range(2000000): x = '%r' % c

d = 0.25

2,000,000 Loops - 2.1191 seconds
d = 0.25 for i in range(2000000): x = '%s' % repr(d)
2,000,000 Loops - 1.5988 seconds
d = 0.25 for i in range(2000000): x = '%r' % d

e = Decimal('Infinity')

2,000,000 Loops - 1.8686 seconds
e = Decimal('Infinity') for i in range(2000000): x = '%s' % repr(e)
2,000,000 Loops - 1.4408 seconds
e = Decimal('Infinity') for i in range(2000000): x = '%r' % e

Calling repr outside of the c-style string replacement causes an additional function to run which is why it is slower. You should always use %r if you want a repr.

a = 'Test'

2,000,000 Loops - 0.5895 seconds
a = 'Test' for i in range(2000000): x = '%s' % str(a)
2,000,000 Loops - 0.2405 seconds
a = 'Test' for i in range(2000000): x = '%s' % a

b = {'Test': 1}

2,000,000 Loops - 2.4255 seconds
b = {'Test': 1} for i in range(2000000): x = '%s' % str(b)
2,000,000 Loops - 1.8897 seconds
b = {'Test': 1} for i in range(2000000): x = '%s' % b

c = 1

2,000,000 Loops - 0.7939 seconds
c = 1 for i in range(2000000): x = '%s' % str(c)
2,000,000 Loops - 0.4583 seconds
c = 1 for i in range(2000000): x = '%s' % c

d = 0.25

2,000,000 Loops - 1.9764 seconds
d = 0.25 for i in range(2000000): x = '%s' % str(d)
2,000,000 Loops - 1.5127 seconds
d = 0.25 for i in range(2000000): x = '%s' % d

e = Decimal('Infinity')

2,000,000 Loops - 1.2021 seconds
e = Decimal('Infinity') for i in range(2000000): x = '%s' % str(e)
2,000,000 Loops - 0.7679 seconds
e = Decimal('Infinity') for i in range(2000000): x = '%s' % e

Calling str outside of the c-style string replacement causes an additional function to run which is why it is slower. You should always use %s if you want a string.

Callable Object

5,000,000 Loops - 1.3815 seconds
class Success: def __call__(self): pass a = Success() for x in range(5000000): if hasattr(a, '__call__'): pass
5,000,000 Loops - 0.8642 seconds
class Success: def __call__(self): pass a = Success() for x in range(5000000): if callable(a): pass

Not Callable Object

5,000,000 Loops - 3.7834 seconds
class Failure: pass b = Failure() for x in range(5000000): if hasattr(b, '__call__'): pass
5,000,000 Loops - 0.8490 seconds
class Failure: pass b = Failure() for x in range(5000000): if callable(b): pass

Both ways pass and fail tests the same.

hasattr calls getattr to see if it raises AttributeError. callable does things in a different way and doesn't have the raising/catching errors overhead.

WARNING: Fromkeys should only be used with immutable defaults.

Default Value = 0

2,000,000 Loops - 0.2987 seconds
a = {} for i in range(2000000): a[i] = 0
2,000,000 Loops - 0.4067 seconds
a = {} for i in range(2000000): if i not in a: a[i] = 0
2,000,000 Loops - 0.6310 seconds
b = defaultdict(int) for i in range(2000000): b[i]
2,000,000 Loops - 0.7407 seconds
a = {} for i in range(2000000): a.setdefault(i, 0)
2,000,000 Loops - 0.2548 seconds
a = {}.fromkeys(
range(2000000),
0
)
2,000,000 Loops - 0.2863 seconds
a = {i: 0 \
for i in range(2000000)}
2,000,000 Loops - 0.6612 seconds
a = dict((i, 0) \
for i in range(2000000))

Default Value = 1

2,000,000 Loops - 0.3161 seconds
a = {} for i in range(2000000): a[i] = 1
2,000,000 Loops - 0.3756 seconds
a = {} for i in range(2000000): if i not in a: a[i] = 1
2,000,000 Loops - 1.0192 seconds
c = defaultdict(lambda: 1) for i in range(2000000): c[i]
2,000,000 Loops - 0.7089 seconds
a = {} for i in range(2000000): a.setdefault(i, 1)
2,000,000 Loops - 0.2597 seconds
a = {}.fromkeys(
range(2000000),
1
)
2,000,000 Loops - 0.3098 seconds
a = {i: 1 \
for i in range(2000000)}
2,000,000 Loops - 0.6986 seconds
a = dict((i, 1) \
for i in range(2000000))

Default Value = {}

2,000,000 Loops - 0.9237 seconds
a = {} for i in range(2000000): a[i] = {}
2,000,000 Loops - 1.0297 seconds
a = {} for i in range(2000000): if i not in a: a[i] = {}
2,000,000 Loops - 1.3183 seconds
d = defaultdict(dict) for i in range(2000000): d[i]
2,000,000 Loops - 1.4463 seconds
a = {} for i in range(2000000): a.setdefault(i, {})
2,000,000 Loops - 0.2545 seconds
a = {}.fromkeys(
range(2000000),
{}
)
2,000,000 Loops - 0.9500 seconds
a = {i: {} \
for i in range(2000000)}
2,000,000 Loops - 1.4195 seconds
a = dict((i, {}) \
for i in range(2000000))

There are so many ways to create a dictionary with a default value for each key. Some ways look cool but most of them are not as fast as just doing the simple a[i] = ? and if needed with a test to see if the value is in a. It is not a good reason to use some code just because it looks cool.

Fromkeys is the fastest way to create a dictionary with default values, but you must never use a mutable default value. This is because it only creates a single default object and assigns it to each of the keys. It is the same issue described in Multiple vs. Single Assignment.