← Back to Recent PostsDon't write module-level code in Python

Don't write module-level code in Python

January 11, 2023

You may have heard that it's bad form to write code at the module level in Python. Why is that, though? In this blog I'll answer that question and provide some hints on how to avoid writing code at the module level in Python.

Why not write code at the module level?

The simplest answer is that any code written in a Python module is run on import. Basically, importing a module has the same effect as running it on its own as a script.

Let's try it out. We'll create a Python file called a_module.py

# a_module.py
print('Hello from a module!')

Now let's make a file called main.py and import a_module:

# main.py
import a_module

If we run main.py we'll see the following:

$ python3 main.py
Hello from a module!

So basically, importing a_module is the same as just running it by itself:

$ python3 a_module.py
Hello from a module!

Why is this a problem?

TL;DR: writing code at the module level can expensive code to run when we don't expect it to and can cause data to go stale when it seems like it shouldn't.

Truthfully, writing code at the module-level isn't a problem in and of itself. In fact it can be useful in some cases. In others, though, it can be a real problem. Let's look at a simple application to find out out.

Let's pretend we're creating a set of API endpoints to get information about users in our database. (Don't worry about the specifics of our database and server code; I'm using a generic database package and Flask-like syntax for our endpoints but they're only for illustrative purposes.)

We'll create users.py to hold our database queries so we don't have to repeat them.

# users.py
import database

all_users = database.query('SELECT * FROM users;')

newest_user = database.query('SELECT * FROM users ORDER BY joined_date DESC LIMIT 1;')

Now we'll write a set of endpoints to access the users:

# user_endpoints.py
import server
from users import all_users, newest_user

# define an endpoint to get all users
@server.get('/users/')
def get_all_users():
    return all_users

# define an endpoint to get the newest user
@server.get('/users/newest_user/')
def get_newest_user():
    return newest_user

# run the server
server.run(8080)

The order of operations for user_endpoints.py goes something like this:

Import all_users and newest_user from users.py
users.py defines all_users and newest_user and runs database queries to initialize them
The /users/ and /users/newest_user/ endpoints are defined
The server is started on port 8080

You may already be able to see some problems with this. Let's look at two specific ones:

1. Both queries from `users.py` are run as soon as they are imported in `user_endpoints.py`

This might be okay when we first start our application. As our application grows and users join, however, this can mean several issues present themselves:

The all_users query takes longer and longer, causing server startup times to slow
The results of both queries are stored in memory. As we add users and as we add other queries we will run out of memory on our server.

2. Our data will get stale

Looking again at the order of operations of user_endpoints.py again we'll see that the newest_user variable only gets initialized once.

Let's say the newest user is Joe when we start the server up. If we hit the /users/newest_user/ endpoint it will return Joe.

After we start the server a new user, Joy, signs up. If we hit the /users/newest_user/ endpoint again it will still return Joe.

Why does this happen? Because the newest_user variable is only initialized once when it's imported from users.py in user_endpoints.py.

We have the same problem with the all_users query: it will not contain Joy if they sign up after the server starts up.

What's the fix?

The fix is really simple: we can just wrap our code in functions and import those instead. Let's see how our app would look if we did this:

# users.py
import database

def query_all_users():
    return database.query('SELECT * FROM users;')


def query_newest_user():
    database.query('SELECT * FROM users ORDER BY joined_date DESC LIMIT 1;')

# user_endpoints.py
import server
from users import query_all_users, query_newest_user

# define an endpoint to get all users
@server.get('/users/')
def get_all_users():
    return query_all_users()

# define an endpoint to get the newest user
@server.get('/users/newest_user/')
def get_newest_user():
    return query_newest_user()

# run the server
server.run(8080)

The order of operations for user_endpoints.py now looks like this:

Import query_all_users and query_newest_user from users.py
users.py defines the query_all_users and query_newest_user functions
The /users/ and /users/newest_user/ endpoints are defined
The server is started on port 8080

As you can see, the queries are not run when the server is started. They will only be run when their respective API endpoint gets hit. This has the effect of:

Reducing server startup time and resource usage
Ensuring the data is up to date since each endpoint will only run its respective query on every request

What about running a file as a script?

If we run the updated users.py file as a script it won't actually do anything. This is because we wrapped all our code in functions but never actually called them within the users.py file. What if we do want it to do something when we run it as a script, though? Let's say we want to print out the results of the query_all_users and query_newest_user functions.

Python has us covered. All we need to do is add the following to the bottom of our users.py file:

# users.py
...

if __name__ == '__main__':
    print(query_all_users())
    print(query_newest_user())

The important part is the if __name__ == '__main__' part. Basically, we're telling Python "if we run this file as a standalone script, execute this code".

Conclusion

In this post we covered two of the biggest problems with module-level code in Python:

Running resource-intensive code when we don't mean or want to
Values can get stale when we don't expect them to

The fix is easy: just wrap your code in functions and call them when you need them.

Hopefully this has been helpful!