We have already learnt the basics of programming. We have all the tools we will ever need to solve any problem we can solve in our heads. But many times we will find ourselves reinventing the wheel. Either a wheel we've made ourselves, e.g. writing the same piece of code over and over again in different programs, or even a wheel someone else has made, e.g. solving a problem someone else has already solved. The point is the code has already been written, and what we want is a convenient way to package it so that we can use it without rewriting it. And thus the module was born.
Modules allow us to write code, and separate it out from other code into a different file, known as a module. We can use the import statement to include the code of another file, either in the current directory, or in python's search path, in place. What this means is that the file is loaded and run as if its code had been in the place of the import statement. As such, any statements in the file are executed, including function definitions, class definitions, assignments, etc.
import <module_name>
'module_name' is the name of a python file without the '.py' extension. Any python program can be imported as a module, although most often modules contain function definitions and a minimum of initialisation code, allowing us to keep commonly used function definitions in a place where they can be reused without rewriting them, and more importantly collecting function definitions that are related, e.g. all database functions, together and separate from our main code.
If we have a python file which we wish to import, that has a global variable 'i', into our main program, also with a global variable 'i', there could be some confusion. The module's 'i' variable is in the global scope of the module, but where is it in the scope of the program as a whole? Python solves this problem with the introduction of namespaces. Formally, a module when imported introduces its own scope block, called the module scope. It is a global scope in its own right. Variables within a modules global scope are forcibly kept separate from their super-programs, or super-modules (i.e. those programs or modules that import them). The global statement does not allow variables references or assignment to cross module scope boundaries. Instead all names (variables, functions, or classes) are created within a module's 'namespace', meaning they must be explicitly referenced from without the module by first naming the module itself, as in
<module_name>.<symbol_name>
Where symbol is the name of a variable assigned a value, a function defined, of a class defined.
#my_module.py i = 2
#main.py i = 1 import my_module.py print i print my_module.i print "---" my_module.i = 3 print i print my_module.i
Running main.py produces the following output
1 2 --- 1 3
Despite the fact that 'i' is assigned the value of 2 in my_module.py
after it is assigned the value of 1 in main.py, since
the import statement executes that assignment after the initial
assignment, the value of 'i' in main.py is not changed. As we can see,
the two 'i's are kept completely separate. From main.py, if we wish to
reference the 'i' in my_module.py, we must do explicitly, using the
module name, my_module.i
. We also see that we can assign a
value to a variable in a module's namespace, using the same syntax.
dir(<object>)
The dir function is a built in function that takes one parameter,
and lists all the attributes of the object given. Methods, properties,
and variables are all considered attributes. If this isn't making sense
to you yet, don't worry, we will cover it all in the sections on object
oriented programming. What we need to know now, is that although the
dir function is not very helpful in a program, it is incredibly
useful when using the python interactive interpreter. As
modules are objects in python, in fact everything is an object, we can
call dir on a module at obtain a list of everything in the module that
can be referenced with the dot notation
(<module_name>.<attribute>
).
Python 2.4.3 (#1, Oct 2 2006, 21:50:13) [GCC 3.4.6 (Gentoo 3.4.6-r1, ssp-3.4.5-1.0, pie-8.7.9)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import my_module >>> dir(my_module) ['__builtins__', '__doc__', '__file__', '__name__', 'i'] >>>
In the example, we import my_module.py (from above), and run 'dir'
on it. We are returned a list containing some stuff we expect, namely
the 'i', which is a variable assigned a value within the module
my_module, and some stuff we don't expect; ['__builtins__',
'__doc__', '__file__', '__name__']
. Short of delving into both
object oriented programming and exception handling all at once, these
elements cannot be easily explained now, and are not especially
important to us.
Of more direct use to use whilst programming, are the locals and
globals functions. Both take no parameters, and both return
dictionaries. globals()
returns a dictionary of objects
available in the global scope, whilst locals()
does the
same but for the local scope. The practical use of this is we can make
even the names of our variables dynamic, i.e. the ability to use a
variable ('a') to store another variable's ('b') name, by using 'a' as
a key into the dictionary returned by either locals, or globals.
#!/usr/bin/python #ask the user for a variable name vname = raw_input("Enter a variable name: ") #ask the user for a value value = raw_input("Enter a value for %s: "%vname) locals()[vname] = value print "vname: ", vname print "locals()[vname]: ", locals()[vname] print "value: ", value #assume the user inputted bob print "bob: ", bob
Outputs the following when 'bob' is entered as a variable name, and 4 as a value. Not how 'bob' becomes available as a normal variable name, as seen in the last line, once it has been added to the locals() dictionary.
Enter a variable name: bob Enter a value for bob: 4 vname: bob locals()[vname]: 4 value: 4 bob: 4
Ultimately, the true use of modules, is to get other people to do the work for us. We are programmers, and thus laziness a virtue, after all. The comprehensive list of standard python modules, i.e. modules that come with a standard install of the latest version (currently 2.5), can be found here. Of particular interest to us, by virtue of their usage most common, are
The sys module is quite extensive, but by far the most useful aspects of it are access to what are known as the standard streams, and the processes command line arguments. When a program is run from the command line, for example python, we can pass the program itself parameters, as if it were a function. In operating system shell speak they are known as arguments. We pass an argument to python to tell it what python file to run,
sirlark@hephaestus ~ $ python my_file.py
We can actually pass any number of arguments to a program we run from the command line, each separated by some amount of white space. If white space is enclosed in quotes of some kind, it does not separate arguments, but rather is included within one command line argument. In python's case, arguments that come after the specification of the filename to run are passed through to the program being run as its own command line arguments.
In the example above, the program my_prog.py receives three command line arguments, namely 'somefile.fasta', '7', and 'ACCTGT AGTCA'. Note the space within the sequence snippet, and the fact that the 7 is in quotes and thus a string.
So now we have a really convenient way of passing small amounts of input into our programs, instead of prompting the user with raw_input calls all the time. Command line arguments are especially useful for the purposes of specifying filenames and options the influence how our program processes data, and preferable to raw_input statements for two reasons. First, when entering filenames, command line arguments allow the user to use shell expansions and tab completions. Second, our program can be run without user intervention if it's likely to take a long time to run, e.g. the user can specify a number of files to process on the command line (probably using a '*.fasta' or similar) and walk away. Assuming each file would take 20 or more minutes to process, if we were using raw_input statements to prompt for the next file, if the user came back the next morning, our program would have processed the first file, and be waiting for the entry of the second filename. Command line arguments avoid this issue neatly.
But how, in python, do we access what command line arguments have been given to our program. Well, the sys module provides a list called 'argv' (for argument vector), which contains as its elements the command line arguments passed to our program, in order. Examine the following simple program. Run it a few times providing different command line arguments each time, and test it with quoted arguments containing spaces or tabs.
#!/usr/bin/python #commargs.py #import the sys module import sys #print out our command line arguments print sys.argv
sirlark@hephaestus ~/scratch $ python commargs.py Hello ['commargs.py', 'Hello'] sirlark@hephaestus ~/scratch $ python commargs.py Hello There ['commargs.py', 'Hello', 'There'] sirlark@hephaestus ~/scratch $ python commargs.py "Hello There" ['commargs.py', 'Hello There'] sirlark@hephaestus ~/scratch $ python commargs.py "Hello There" 47 ['commargs.py', 'Hello There', '47'] sirlark@hephaestus ~/scratch $
Note how the first element of sys.argv
is always the
name of our python program. Where after the arguments are those we gave
on the command line. Also note how all the elements, even the 47, are
in fact strings.
Whenever a python program is run, it opens three files automatically, called standard input (stdin), standard output (stdout), and standard error (stderr). Generally these files represent keyboard input, screen output, and error output respectively, but they can be redirected causing the input to our program to come from the output of another, for example, as when used in a shell pipe ('|'). Similarly our programs output could be redirected to a file, or piped into the input of another process. However, all these effects are transparent to us ... the print statement actually writes to stdout, the raw_input function reads from stdin. If we want to write to stderr however, we must do so explicitly. The three standard streams are all accessible via the sys module as
sys.stdin
: standard inputsys.stdout
: standard outputsys.stderr
: standard error. This stream exists to
differentiate error output from normal output, so that we can use
the shell to split off errors from processes executed in large
batches, and not have to deal with all the normal output they
produce. This stream is usually outputted to the screen, unless it
has been redirected, which means that programs can generally issue
error messages to the screen even if their normal output has been
redirected.Each of these are file objects, so sys.stdin, which is opened only
for reading, has the usual .read
, .readline
,
and .readlines
methods, whilst the other two, stdout and
stderr, being opened only for writing have the .write
and
.writelines
methods available. Here's an example of how to
use stdout and stderr to split output effectively.
#!/usr/bin/python import sys #get a line of input from the user print "Please enter a phrase: " #this gets sent to stdout phrase = sys.stdin.readline() #this gets read from stdin if phrase.isdigits(): sys.stderr.write("Phrase entered was a number\n") #this gets sent to stderr else: sys.stdout.write("Phrase contains %i 'a' characters\n"%phrase.count('a'))
There's one more feature of the print statement that now becomes important, namely redirection. We can specify a standard stream (or in fact any file object open for writing) to print to, instead of standard output. The syntax is:
print >> <file object>, <expression>[, <expression>...]
So the above example could be redone as follows:
#!/usr/bin/python import sys #get a line of input from the user print "Please enter a phrase: " #this gets sent to stdout phrase = sys.stdin.readline() #this gets read from stdin if phrase.isdigits(): print >> sys.stderr, "Phrase entered was a number" #this gets sent to stderr else: print "Phrase contains %i 'a' characters"%phrase.count('a'))
sys.argv[0]
? What are they?