Don't write module-level code in Python
January 11, 2023
You may have heard that it's bad form to write code at the module level in Python. Why is that, though? In this blog I'll answer that question and provide some hints on how to avoid writing code at the module level in Python.
Why not write code at the module level?
The simplest answer is that any code written in a Python module is run on import. Basically, importing a module has the same effect as running it on its own as a script.
Let's try it out. We'll create a Python file called a_module.py
# a_module.py
print('Hello from a module!')
Now let's make a file called main.py
and import a_module
:
# main.py
import a_module
If we run main.py
we'll see the following:
$ python3 main.py
Hello from a module!
So basically, importing a_module
is the same as just running it by itself:
$ python3 a_module.py
Hello from a module!
Why is this a problem?
TL;DR: writing code at the module level can expensive code to run when we don't expect it to and can cause data to go stale when it seems like it shouldn't.
Truthfully, writing code at the module-level isn't a problem in and of itself. In fact it can be useful in some cases. In others, though, it can be a real problem. Let's look at a simple application to find out out.
Let's pretend we're creating a set of API endpoints to get information about users in our database. (Don't worry about the specifics of our database and server code; I'm using a generic database package and Flask-like syntax for our endpoints but they're only for illustrative purposes.)
We'll create users.py
to hold our database queries so we don't have to repeat them.
# users.py
import database
all_users = database.query('SELECT * FROM users;')
newest_user = database.query('SELECT * FROM users ORDER BY joined_date DESC LIMIT 1;')
Now we'll write a set of endpoints to access the users:
# user_endpoints.py
import server
from users import all_users, newest_user
# define an endpoint to get all users
@server.get('/users/')
def get_all_users():
return all_users
# define an endpoint to get the newest user
@server.get('/users/newest_user/')
def get_newest_user():
return newest_user
# run the server
server.run(8080)
The order of operations for user_endpoints.py
goes something like this:
- Import
all_users
andnewest_user
fromusers.py
users.py
definesall_users
andnewest_user
and runs database queries to initialize them- The
/users/
and/users/newest_user/
endpoints are defined - The server is started on port
8080
You may already be able to see some problems with this. Let's look at two specific ones:
1. Both queries from users.py
are run as soon as they are imported in user_endpoints.py
This might be okay when we first start our application. As our application grows and users join, however, this can mean several issues present themselves:
- The
all_users
query takes longer and longer, causing server startup times to slow - The results of both queries are stored in memory. As we add users and as we add other queries we will run out of memory on our server.
2. Our data will get stale
Looking again at the order of operations of user_endpoints.py
again we'll see that the newest_user
variable only gets initialized once.
Let's say the newest user is Joe
when we start the server up. If we hit the /users/newest_user/
endpoint it will return Joe
.
After we start the server a new user, Joy
, signs up. If we hit the /users/newest_user/
endpoint again it will still return Joe
.
Why does this happen? Because the newest_user
variable is only initialized once when it's imported from users.py
in user_endpoints.py
.
We have the same problem with the all_users
query: it will not contain Joy
if they sign up after the server starts up.
What's the fix?
The fix is really simple: we can just wrap our code in functions and import those instead. Let's see how our app would look if we did this:
# users.py
import database
def query_all_users():
return database.query('SELECT * FROM users;')
def query_newest_user():
database.query('SELECT * FROM users ORDER BY joined_date DESC LIMIT 1;')
# user_endpoints.py
import server
from users import query_all_users, query_newest_user
# define an endpoint to get all users
@server.get('/users/')
def get_all_users():
return query_all_users()
# define an endpoint to get the newest user
@server.get('/users/newest_user/')
def get_newest_user():
return query_newest_user()
# run the server
server.run(8080)
The order of operations for user_endpoints.py
now looks like this:
- Import
query_all_users
andquery_newest_user
fromusers.py
users.py
defines thequery_all_users
andquery_newest_user
functions- The
/users/
and/users/newest_user/
endpoints are defined - The server is started on port
8080
As you can see, the queries are not run when the server is started. They will only be run when their respective API endpoint gets hit. This has the effect of:
- Reducing server startup time and resource usage
- Ensuring the data is up to date since each endpoint will only run its respective query on every request
What about running a file as a script?
If we run the updated users.py
file as a script it won't actually do anything. This is because we wrapped all our code in functions but never actually called them within the users.py
file. What if we do want it to do something when we run it as a script, though? Let's say we want to print out the results of the query_all_users
and query_newest_user
functions.
Python has us covered. All we need to do is add the following to the bottom of our users.py
file:
# users.py
...
if __name__ == '__main__':
print(query_all_users())
print(query_newest_user())
The important part is the if __name__ == '__main__'
part. Basically, we're telling Python "if we run this file as a standalone script, execute this code".
Conclusion
In this post we covered two of the biggest problems with module-level code in Python:
- Running resource-intensive code when we don't mean or want to
- Values can get stale when we don't expect them to
The fix is easy: just wrap your code in functions and call them when you need them.
Hopefully this has been helpful!