Introductory Programming in Python: Lesson 15
Importing Standard Modules

[Prev: Variable Scope] [Course Outline] [Next: More on Command Line Arguments]

The import Statement

We have already learnt the basics of programming. We have all the tools we will ever need to solve any problem we can solve in our heads. But many times we will find ourselves reinventing the wheel. Either a wheel we've made ourselves, e.g. writing the same piece of code over and over again in different programs, or even a wheel someone else has made, e.g. solving a problem someone else has already solved. The point is the code has already been written, and what we want is a convenient way to package it so that we can use it without rewriting it. And thus the module was born.

Modules allow us to write code, and separate it out from other code into a different file, known as a module. We can use the import statement to include the code of another file, either in the current directory, or in python's search path, in place. What this means is that the file is loaded and run as if its code had been in the place of the import statement. As such, any statements in the file are executed, including function definitions, class definitions, assignments, etc.

import <module_name>

'module_name' is the name of a python file without the '.py' extension. Any python program can be imported as a module, although most often modules contain function definitions and a minimum of initialisation code, allowing us to keep commonly used function definitions in a place where they can be reused without rewriting them, and more importantly collecting function definitions that are related, e.g. all database functions, together and separate from our main code.

Namespaces

If we have a python file which we wish to import, that has a global variable 'i', into our main program, also with a global variable 'i', there could be some confusion. The module's 'i' variable is in the global scope of the module, but where is it in the scope of the program as a whole? Python solves this problem with the introduction of namespaces. Formally, a module when imported introduces its own scope block, called the module scope. It is a global scope in its own right. Variables within a modules global scope are forcibly kept separate from their super-programs, or super-modules (i.e. those programs or modules that import them). The global statement does not allow variables references or assignment to cross module scope boundaries. Instead all names (variables, functions, or classes) are created within a module's 'namespace', meaning they must be explicitly referenced from without the module by first naming the module itself, as in

<module_name>.<symbol_name>

Where symbol is the name of a variable assigned a value, a function defined, of a class defined.

#my_module.py

i = 2

#main.py
i = 1

import my_module.py
print i
print my_module.i

print "---"

my_module.i = 3
print i
print my_module.i

Running main.py produces the following output

1
2
---
1
3

Despite the fact that 'i' is assigned the value of 2 in my_module.py after it is assigned the value of 1 in main.py, since the import statement executes that assignment after the initial assignment, the value of 'i' in main.py is not changed. As we can see, the two 'i's are kept completely separate. From main.py, if we wish to reference the 'i' in my_module.py, we must do explicitly, using the module name, my_module.i. We also see that we can assign a value to a variable in a module's namespace, using the same syntax.

The dir function

dir(<object>)

The dir function is a built in function that takes one parameter, and lists all the attributes of the object given. Methods, properties, and variables are all considered attributes. If this isn't making sense to you yet, don't worry, we will cover it all in the sections on object oriented programming. What we need to know now, is that although the dir function is not very helpful in a program, it is incredibly useful when using the python interactive interpreter. As modules are objects in python, in fact everything is an object, we can call dir on a module at obtain a list of everything in the module that can be referenced with the dot notation (<module_name>.<attribute>).

Python 2.4.3 (#1, Oct	2 2006, 21:50:13) 
[GCC 3.4.6 (Gentoo 3.4.6-r1, ssp-3.4.5-1.0, pie-8.7.9)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import my_module
>>> dir(my_module)
['__builtins__', '__doc__', '__file__', '__name__', 'i']
>>>

In the example, we import my_module.py (from above), and run 'dir' on it. We are returned a list containing some stuff we expect, namely the 'i', which is a variable assigned a value within the module my_module, and some stuff we don't expect; ['__builtins__', '__doc__', '__file__', '__name__']. Short of delving into both object oriented programming and exception handling all at once, these elements cannot be easily explained now, and are not especially important to us.

The locals and globals Functions

Of more direct use to use whilst programming, are the locals and globals functions. Both take no parameters, and both return dictionaries. globals() returns a dictionary of objects available in the global scope, whilst locals() does the same but for the local scope. The practical use of this is we can make even the names of our variables dynamic, i.e. the ability to use a variable ('a') to store another variable's ('b') name, by using 'a' as a key into the dictionary returned by either locals, or globals.

#!/usr/bin/python

#ask the user for a variable name
vname = raw_input("Enter a variable name: ")

#ask the user for a value
value = raw_input("Enter a value for %s: "%vname)
locals()[vname] = value

print "vname: ", vname
print "locals()[vname]: ", locals()[vname]
print "value: ", value

#assume the user inputted bob
print "bob: ", bob

Outputs the following when 'bob' is entered as a variable name, and 4 as a value. Not how 'bob' becomes available as a normal variable name, as seen in the last line, once it has been added to the locals() dictionary.

Enter a variable name: bob
Enter a value for bob: 4
vname:	bob
locals()[vname]:	4
value:	4
bob: 4

An Overview of Selected Standard Modules

Ultimately, the true use of modules, is to get other people to do the work for us. We are programmers, and thus laziness a virtue, after all. The comprehensive list of standard python modules, i.e. modules that come with a standard install of the latest version (currently 2.5), can be found here. Of particular interest to us, by virtue of their usage most common, are

math: Provides various non-basic mathematical functions, inter alia: trigonometric functions, hyperbolic functions, exponent and logarithm functions, and mathematical constants (e.g. pi and e)
random: Provides various methods of choosing random numbers, or making random choices from sequences, using a variety of statistical distributions.
os: Provides multitudinous operating system related functions, primarily file and process management functions.
sys: Provides a large number of variables and functions dealing with the internals of the python interpreter itself, and the environment in which it was invoked. Specific uses of various sys variables are discussed shortly.
time: Provides various functions for obtaining the current time, and manipulating variables that store values representing time.

The sys Module

The sys module is quite extensive, but by far the most useful aspects of it are access to what are known as the standard streams, and the processes command line arguments. When a program is run from the command line, for example python, we can pass the program itself parameters, as if it were a function. In operating system shell speak they are known as arguments. We pass an argument to python to tell it what python file to run,

sirlark@hephaestus ~ $ python my_file.py

We can actually pass any number of arguments to a program we run from the command line, each separated by some amount of white space. If white space is enclosed in quotes of some kind, it does not separate arguments, but rather is included within one command line argument. In python's case, arguments that come after the specification of the filename to run are passed through to the program being run as its own command line arguments.

sirlark@hephaestus ~ $ python my_prog.py somefile.fasta 7 "ACCTGT AGTCA"

In the example above, the program my_prog.py receives three command line arguments, namely 'somefile.fasta', '7', and 'ACCTGT AGTCA'. Note the space within the sequence snippet, and the fact that the 7 is in quotes and thus a string.

So now we have a really convenient way of passing small amounts of input into our programs, instead of prompting the user with raw_input calls all the time. Command line arguments are especially useful for the purposes of specifying filenames and options the influence how our program processes data, and preferable to raw_input statements for two reasons. First, when entering filenames, command line arguments allow the user to use shell expansions and tab completions. Second, our program can be run without user intervention if it's likely to take a long time to run, e.g. the user can specify a number of files to process on the command line (probably using a '*.fasta' or similar) and walk away. Assuming each file would take 20 or more minutes to process, if we were using raw_input statements to prompt for the next file, if the user came back the next morning, our program would have processed the first file, and be waiting for the entry of the second filename. Command line arguments avoid this issue neatly.

But how, in python, do we access what command line arguments have been given to our program. Well, the sys module provides a list called 'argv' (for argument vector), which contains as its elements the command line arguments passed to our program, in order. Examine the following simple program. Run it a few times providing different command line arguments each time, and test it with quoted arguments containing spaces or tabs.

#!/usr/bin/python
#commargs.py

#import the sys module
import sys

#print out our command line arguments
print sys.argv

sirlark@hephaestus ~/scratch $ python commargs.py Hello
['commargs.py', 'Hello']
sirlark@hephaestus ~/scratch $ python commargs.py Hello There
['commargs.py', 'Hello', 'There']
sirlark@hephaestus ~/scratch $ python commargs.py "Hello There"
['commargs.py', 'Hello There']
sirlark@hephaestus ~/scratch $ python commargs.py "Hello There" 47
['commargs.py', 'Hello There', '47']
sirlark@hephaestus ~/scratch $

Note how the first element of sys.argv is always the name of our python program. Where after the arguments are those we gave on the command line. Also note how all the elements, even the 47, are in fact strings.

The Standard Streams

Whenever a python program is run, it opens three files automatically, called standard input (stdin), standard output (stdout), and standard error (stderr). Generally these files represent keyboard input, screen output, and error output respectively, but they can be redirected causing the input to our program to come from the output of another, for example, as when used in a shell pipe ('|'). Similarly our programs output could be redirected to a file, or piped into the input of another process. However, all these effects are transparent to us ... the print statement actually writes to stdout, the raw_input function reads from stdin. If we want to write to stderr however, we must do so explicitly. The three standard streams are all accessible via the sys module as

sys.stdin: standard input
sys.stdout: standard output
sys.stderr: standard error. This stream exists to differentiate error output from normal output, so that we can use the shell to split off errors from processes executed in large batches, and not have to deal with all the normal output they produce. This stream is usually outputted to the screen, unless it has been redirected, which means that programs can generally issue error messages to the screen even if their normal output has been redirected.

Each of these are file objects, so sys.stdin, which is opened only for reading, has the usual .read, .readline, and .readlines methods, whilst the other two, stdout and stderr, being opened only for writing have the .write and .writelines methods available. Here's an example of how to use stdout and stderr to split output effectively.

#!/usr/bin/python

import sys

#get a line of input from the user
print "Please enter a phrase: "	#this gets sent to stdout
phrase = sys.stdin.readline()	#this gets read from stdin

if phrase.isdigits():
    sys.stderr.write("Phrase entered was a number\n")	#this gets sent to stderr
else:
    sys.stdout.write("Phrase contains %i 'a' characters\n"%phrase.count('a'))

There's one more feature of the print statement that now becomes important, namely redirection. We can specify a standard stream (or in fact any file object open for writing) to print to, instead of standard output. The syntax is:

print >> <file object>, <expression>[, <expression>...]

So the above example could be redone as follows:

#!/usr/bin/python

import sys

#get a line of input from the user
print "Please enter a phrase: "	#this gets sent to stdout
phrase = sys.stdin.readline()	#this gets read from stdin

if phrase.isdigits():
    print >> sys.stderr, "Phrase entered was a number"	#this gets sent to stderr
else:
    print "Phrase contains %i 'a' characters"%phrase.count('a'))

Exercises

What is redirection, and what are it's effects?
When, and why, would you want to output to stderr?
When, and why, are command line arguments preferable to taking input from the keyboard?
Can you think of uses for sys.argv[0]? What are they?