Introductory Programming in Python: Lesson 12
Strings in Depth

[Prev: Dictionaries] [Course Outline] [Next: Flow Control: Functions]

Strings as Sequences

Strings can be thought of as sequences (as lists are sequences) of characters. As such many of the methods that work on lists work on strings. Strings in fact have more functionality associated with them, by virtue of the fact that in manipulating text, many more tasks involving character (as opposed to values of arbitrary type) are common and useful. We'll start with the ones familiar from lists. Note that the complete list of methods associated with strings is available in the python documentation, which describes additional optional parameters not discussed here for the sake of brevity.

<string>.count(<substring>) returns the number of times substring occurs within the string.
<string>.find(<substring>) returns the index within the string of the first (from the left) occurrence of 'substring'. Returns -1 if substring cannot be found.
<string>.rfind(<substring>) returns the index within the string of the last (from the right) occurrence of 'substring'. Returns -1 if substring cannot be found.
<string>.index(<substring>) returns the index within the string of the first (from the left) occurrence of 'substring'. Causes an error if substring cannot be found.
<string>.rindex(<substring>) returns the index within the string of the last (from the right) occurrence of 'substring'. Causes an error if substring cannot be found.

Python 2.4.3 (#1, Oct 2 2006, 21:50:13) 
[GCC 3.4.6 (Gentoo 3.4.6-r1, ssp-3.4.5-1.0, pie-8.7.9)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> s = "The quick brown fox jumps slowly over the lazy cow"
>>> s.count("ow")
3
>>> s.find("brown")
10
>>> s.find("not here")
-1
>>> s.find("ow")
12
>>> s.rfind("ow")
48
>>> s.index("ow")
12
>>> s.rindex("ow")
48
>>> s.rindex("not here")
Traceback (most recent call last):
    File ">stdin<", line 1, in ?
    ValueError: substring not found
>>>

Formatting Strings using String Methods

The most commonly used methods on strings are those to change the format of text. With these methods we can change the case of various characters in the text, according to common patterns, pad the text with spaces on the left and right to justify it appropriately or even center it across a given width, and strip out whitespace in various ways.

<string>.capitalize() returns a copy of the string with only the first character in uppercase.
<string>.swapcase() returns a copy of the string with every character's case inverted.
<string>.center(<width>) returns a string of width 'width' with the original string centered, i.e. equally padded with spaces on the left and right, within it.
<string>.ljust(<width>) returns the original string left justified within a string of width 'width', i.e. padded with spaces up to length 'width'.
<string>.rjust(<width>) returns the original string right justified within a string of width 'width', i.e. padded on the left with spaces to make a string of length 'width'.
<string>.lower() returns a copy of the original string, but with all characters in lowercase.
<string>.upper() returns a copy of the original string, but with all characters in uppercase.
<string>.strip() returns a copy of the string with all whitespace at the beginning and end of the string stripped away.
<string>.lstrip() returns a copy of the string with all whitespace at the beginning of the string stripped away.
<string>.rstrip() returns a copy of the string with all whitespace at the end of the string stripped away.
<string>.replace(<old>, <new>) returns a copy of the string in which all non-overlapping instances of 'old' are replaced by 'new'.

>>> "a sentence poorly capitalized".capitalize()
'A sentence poorly capitalized'
>>>
>>> "aBcD".swapcase()
'AbCd'
>>>
>>> "center me please".center(60)
'                      center me please                      '
>>>
>>> "I need some justification here".ljust(60)
'I need some justification here                              '
>>>
>>> "No! Real Justification, the RIGHT justification".rjust(60)
'             No! Real Justification, the RIGHT justification'
>>>
>>> "LOWER me Down".lower()
'lower me down'
>>>
>>> "raise Me UP".upper()
'RAISE ME UP'
>>>
>>> "     I put my whitespace left, I put my whitespace right     ".strip()
'I put my whitespace left, I put my whitespace right'
>>>
>>> "     I strip it all off, and I shake all about      ".lstrip()
'I strip it all off, and I shake all about      '
>>>
>>> "      and now I've been arrested for indecent exposure     ".rstrip()
"      and now I've been arrested for indecent exposure"
>>>
>>> "Sung to the tune of 'The h0ky p0ky'".replace("0ky","okey")
"Sung to the tune of 'The hokey pokey'"
>>>

Formatting Strings using the Interpolation Operator

After all that, let's cut to the chase. The interpolation operator on strings. This provides the majority of string formatting operations in a single consistent pattern. Learn it, understand it, appreciate its inner beauty!

Formally put, the interpolation operator interpolates a sequence of values (i.e. a list, tuple, or in some special cases a dictionary) into a string containing interpolation points (Placeholders). Wowsers we say? Again in English? The interpolation operator combines a string containing certain codes and a sequence containing values, such that those values are inserted into their respective positions within the string, defined by the position of the codes, formatted according to the specification of those codes, and replacing those codes... Example time

>>> s = "My very %s monkey jumps swiftly under %i planets" % ("energetic", 9)
>>> s
'My very energetic monkey jumps swiftly under 9 planets'
>>>

Examining the above example, we had a string containing two strange % thingies, and a tuple containing 2 elements. Spot the correlation! 2 % thingies, 2 elements. When combined using the '%' operator, the contents of the tuple were 'merged into' the string at the points where the % thingies were, at their respective positions (by relative position left to right), replacing the % thingies.

Time to get technical. And thingie is not a technical term, except amongst electrical engineers and biochemists. So firstly, the % thingie in the string is called a conversion specification. This is because all values in the sequence are converted to strings during the merge. It has a specific format, namely it starts with a '%' symbol, and must be at least two characters. It's easier to show the complete format in point form, so here it is...

%
Conversion specifications must start with the '%' symbol.
(<mapping/key name>) *optional
The '%' may be optionally followed by a key name from a dictionary used in the interpolation (i.e. instead of a tuple or list). This is required if you use a dictionary to interpolate, as dictionary key order is not defined, so order cannot be used to relate conversion specifications to their respective elements in the dictionary.
#, 0, -, , + *optional
An optional conversion flag may be used to specify justification and signedness options. Any number of '0', '-', '+', and ' ' can be used in a given conversion specification, and the important ones are;
- '0': left pad with zeroes, useful for month numbers
- '-': right pad with spaces, overrides '0' if both given
- '+': force the use of a plus sign in front of positive numbers
- ' ': insert a space in front of positive numbers (used to line up with negative numbers where a minus is placed in front.)
<field width> *optional
An optional minimum field width. Whatever value is merged in at this point in the string, is converted to a string that is at least as wide as the field width, specified as an integer. Note, because the number is inside a string it must be hard coded, and cannot be an expression.
.<precision> *optional
An optional precision level can be specified (in digits). This will ensure that the precision of floats is truncated to this length. Floats will not be padded.
<Conversion Type> mandatory
The conversion type character is a single character specifying the type of value to convert into a string and how the conversion should happen. The complete list of valid characters can be found on the documentation page, but the important ones are
- 'i': convert an integer
- 'e': convert a float to scientific notation
- 'f': convert a float to decimal notation
- 's': convert a string
- '%': convert nothing, just insert a '%'

>>> "An integer with field width of three: %3i"%(5,)
'An integer with field width of three:   5'
>>>
>>> "An integer left justified: %-3i"%(5,)
'An integer left justified: 5  '
>>>
>>> "An integer with leading zeros: %03i"%(5,)
'An integer with leading zeros: 005'
>>>
>>> "An integer right justified with forced +: %+3i"%(5,)
'An integer right justified with forced +:  +5'
>>>
>>> "A float: %f"%2.5
'A float: 2.500000'
>>>
>>> "A float: %.1f"%2.5
'A float: 2.5'
>>>
>>> "A float: %4.1f"%2.5
'A float:  2.5'
>>>
>>> "A float: %04.1f"%2.5
'A float: 02.5'
>>>
>>> "A float in sci notation: %06.1e"%(0.0000025)
'A float in sci notation: 2.5e-06'
>>>
>>> "A percentage symbol: %% %s"%(" ")
'A percentage symbol: %  '
>>>

Miscellaneous String Methods

Finally, there are a few miscellaneous methods that prove very useful when dealing with strings. These include

<string>.isupper() return True if the string contains only uppercase characters.
<string>.islower() return True if the string contains only lowercase characters.
<string>.isalpha() return True if the string contains only alphabetic characters.
<string>.isalnum() return True if the string contains only alphabetic characters and/or digits.
<string>.isdigit() return True if the string contains only digits.
<string>.isspace() return True if the string contains only white space characters.
<string>.endswith(<substring>) returns True if the string ends with the substring 'substring'.
<string>.startswith(<substring>) returns True is the string starts with the substring 'substring.
<string>.join(<sequence>) returns the elements of 'sequence' (which must be strings) concatenated in order with the string between each element.
<string>.split([substring]) returns a list of strings, such that the string is split by 'substring' and each portion is an element of the returned list. If substring is not specified, the string is split on whitespace.
<string>.rsplit([substring]); the same as split, but the search for the split string is performed from right to left

>>> "The quick brown fox".endswith("dog")
False
>>>
>>> "The quick brown fox".endswith("fox")
True
>>>
>>> "The quick brown fox".startswith("A")
False
>>>
>>> "The quick brown fox".startswith("The ")
True
>>>
>>> ", ".join(['1', '2', '3', '4'])
'1, 2, 3, 4'
>>> 
>>> "a, b, c, d".split(',')
['a', ' b', ' c', ' d']
>>>
>>> "a, b, c, d".split(', ')
['a', 'b', 'c', 'd']
>>> 
>>> "abababa".split("bab")
['a', 'aba']
>>>
>>> "abababa".rsplit("bab")
['aba', 'a']
>>>

Exercises

Strings are immutable. How can one change the contents of a string type variable, for example to insert '-)' after every colon?
If we try to use the interpolation operator with a tuple, the tuple must have the same number of elements as there are '%' characters in the string. Is this the case with dictionaries? Why?
Write a program that reads in names until a blank line is encountered, after which it prints out an English style list, i.e. a comma separated list, where the last name is preceded by the word 'and' instead of a comma.
Write a program that reads in a line of space separated names, after which it prints out an English style list, i.e. a comma separated
What is the value of "Laziness is a %s."%("virtue")?
What is the value of "%i days hath %s, %s, %s and %s. I use my %s for the other %i, because I can't remember this rhyme for %s"%(30, "September", "April", "June", "November", "knuckles", 8, "...")?
What is the value of "%02i/%02i/%04i"%(10,3,2009)?
What is the value of "%5.3f"%(3.1415)?
How would you print a column of numbers so the line up right justified for convenient addition?
How would you print a user entered string centered in the middle of the console?
Write a program that reads in the name, price, and quantity of an item, and stores it in a list of tuples, repeating until a blank product name is entered. It should then print out each item in a nicely formatted manner, using string interpolations.
Modify your answer to question 11 to use products from a dictionary of product codes mapped to product descriptions. Invalid codes should print a warning, and product codes should be integers. The print out at the end should print the full name of the item, followed in brackets by it's code, as well as price and quantity.