Author:Guido van Rossum
Developer:Python Software Foundation
File, .pyc, .pyo

Python is a dynamically typed general-purpose scripting language which supports both object-oriented programming and functional programming, created by Guido van Rossum in 1991; is "an interpreted, interactive, object-oriented programming language that combines remarkable power with very clear syntax" [56].


1   Etymology

Python gets its name from `Monty Python's Flying Circus`_ [18]:

Over six years ago, in December 1989, I was looking for a "hobby" programming project that would keep me occupied during the week around Christmas. My office (a government-run research lab in Amsterdam) would be closed, but I had a home computer, and not much else on my hands. I decided to write an interpreter for the new scripting language I had been thinking about lately: a descendant of ABC that would appeal to Unix / C hackers. I chose Python as a working title for the project, being in a slightly irreverent mood (and a big fan of Monty Python's Flying Circus).

2   Matter

Python consists of three parts:

  1. The Python programming language
  2. The Python runtime environment
  3. A Python virtual machine (the program_ which actually runs Python scripts)

2.2   Lexical analysis

A Python program is read by a parser. Input to the parser is a stream of tokens, generated by the lexical analyzer. [17]

A Python program is divided into a number of logical lines. [17]

The end of a logical line is represented by the token NEWLINE.

2.2.1   Literals

Literals are notations for constant values of some built-in types.

2.2.2   Operators

Comparison operators can chained, e.g. if x > y and y > z can be written as if x > y > z.

Python has four bitwise operators: &, |, ^, ~

  • for ints, & converts both ints to their binary representation, performs the AND operation, and returns the decimal version of the result. similar results with | (OR) ,^ (XOR), and ~ (NOT)
  • for boolean expressions, they act as expected. Note that & does not evaluate lazily, which can lead to unexpected results.
  • for sets, use |, &, -, ^ for union, intersection, difference, and symmetric difference

2.3   Data model

The principal built-in types are numerics, sequences, mappings, classes, instances and exceptions.

Historically (until release 2.2), Python's built-in types have differed from user-defined types because it was not possible to use the built-in types as the basis for object-oriented inheritance. This limitation does not exist any longer. [21]

2.3.1   Objects, values and types

Objects are Python's abstraction for data. All data in a Python program is represented by objects or by relations between objects. (In a sense, and in conformance to Von Neumann's model of a stored program computer, code is also represented by objects.) [20]

Every object has an identity, a type and a value. An object's identity never changes once it has been created; you may think of it as the object's address in memory. The is operator compares the identity of two objects; the id() function returns an integer representing its identity (currently implemented as its address). An object's type is also unchangeable. [20]

An object's type determines the operations that the object supports (e.g., "does it have a length?") and also defines the possible values for objects of that type. [20]

The type() function returns an object's type (which is an object itself). [20]

The value of some objects can change. [20] Objects whose value can change are said to be "mutable"; objects whose value is unchangeable once they are created are called "immutable". [20] (The value of an immutable container object that contains a reference to a mutable object can change when the latter's value is changed; however the container is still considered immutable, because the collection of objects it contains cannot be changed. So, immutability is not strictly the same as having an unchangeable value, it is more subtle.) [20] An object's mutability is determined by its type; for instance, numbers, strings and tuples are immutable, while dictionaries and lists are mutable. [20]

>>> a = []
>>> b = a
>>> b.append(0)
>>> a

Objects are never explicitly destroyed; however, when they become unreachable they may be garbage-collected. [20]

2.3.2   The standard type hierarchy

Below is a list of the types that are built into Python.   None

None evaluates to False.   Numbers

A number is a sequence of digits, optionally preceded by a base indicator (0o for octal, 0x for hex, or 0b for binary).

Numbers can be represented in scientific notation, e.g. 10e4 == 100,000   Integers - int   Booleans - bool

Historically, logical true/false operations tended to simply use 0 for false and 1 for true; in the course of Python 2.2's life-cycle, Guido noticed that too many modules started with assignments such as false = 0; true = 1 and this produced boilerplate and useless variation (the latter because the capitalization of true and false was all over the place -- some used all-caps, some all-lowercase, some cap-initial) and so introduced the bool subclass of int and its True and False constants.

—Alex Martelli

The bool type is a subtype of the int type, and the values False and True behave like 0 and 1 in most respects, except repr() and str(). [46]

Many programmers apparently feel the need for a Boolean type; most Python documentation contains a bit of an apology for the absence of a Boolean type. I've seen lots of modules that defined constants "False=0" and "True=1" (or similar) at the top and used those. The problem with this is that everybody does it differently. For example, should you use "FALSE", "false", "False", "F" or even "f"? And should false be the value zero or None, or perhaps a truth value of a different type that will print as "true" or "false"? Adding a standard bool type to the language resolves those issues.

The standard bool type can also serve as a way to force a value to be interpreted as a Boolean, which can be used to normalize Boolean values. When a Boolean value needs to be normalized to one of two values, bool(x) is much clearer than "not not x" and much more concise than

if x:
return 1
return 0

Here are some arguments derived from teaching Python. When showing people comparison operators etc. in the interactive shell, I think this is a bit ugly:

>>> a = 13
>>> b = 12
>>> a > b

If this was:

>>> a > b

it would require a millisecond less thinking each time a 0 or 1 was printed.

—Guido van Rossum, PEP 285


Other languages (C99, C++, Java) name the constants "false" and "true", in all lowercase. For Python, I prefer to stick with the example set by the existing built-in constants, which all use CapitalizedWords: None, Ellipsis, NotImplemented (as well as all built-in exceptions). Python's built-in namespace uses all lowercase for functions and types only.

—Guido van Rossum, PEP 285


Because bool inherits from int, True+1 is valid and equals 2, and so on. This is important for backwards compatibility: because comparisons and so on currently return integer values, there's no way of telling what uses existing applications make of these values.

—Guido van Rossum, PEP 285

Python 2 does not guarantee that False equals 0 and True equals 1. A malicious user may overwrite them.   Sequences

The sequence types are lists, strings, and tuples.

Sequences can hold any type.

  • len
  • indexing
  • slicing
  • iteration
  • note: can't change sequence in place

Any iterable can take advantage of tuple-unpacking.

  • not restricted to tuples -- any iterable will work
  • if we have a tuple t = (1, 2, 3) we can unpack it like a, b, c = t, giving us a=1, b=2, c=3
  • when passing tuples to functions which take multiple parameters, tuples can be unpacked with *, ie *args

For example:

>>> a, b = xrange(2)
>>> a, b
(0, 1)
>>> x, y = {'a': 1, 'b': 2}
>>> x, y
('a', 'b')

Slice objects:

x = slice(-3, None)
a = range(6)
a[x] = [3, 4, 5]

An empty sequence has a truth value of False.   Immutable sequences

An object of an immutable sequence type cannot change once it is created. (If the object contains references to other objects, these other objects may be mutable and may be changed; however, the collection of objects directly referenced by an immutable object cannot change.)

Python has three kinds of immutable sequences: byte strings, Unicode objects, and tuples. Strings and Unicode objects are both subclasses of basestring.

There are two types of strings: byte (8-bit) strings (which look like 'foo' on 2.x) and Unicode strings (which have a leading 'u' prefix, e.g. u'foo'). Since 2.6 you can also be explicit about byte strings and write them with a leading 'b' prefix: b'foo'. [29] Both types of string are kinds of basestring.   String - str

The str built-in type is a subclass of basestring that represents byte strings. The items of a string are characters. There is no separate character type; a character is represented by a string of one item.

The built-in functions chr() and ord() convert between characters and non-negative integers representing the byte values. Bytes with the values 0-127 usually represent the corresponding ASCII values, but the interpretation of values is up to the program.

For performance reasons, prefer " ".join(("Hi", first_name, last_name)) over "Hi " + first_name + " " + last-name. String concatenation runs in O(n**2) where as str.join concatenates a list reasonably quickly O(n).

String formating:

    print "My name is " + first_name + " " + last_name
    print "My name is %s %s" % (first_name, last_name)

Apparently " " doesn't work. Use u"u00A0" instead.   Unicode - unicode
Surrogate pairs may be present in the Unicode object, and will be reported as two separate items. [32]

The unicode built-in type is a subclass of basestring that represents Unicode strings. [31]

The unicode() constructor has the signature unicode(string[, encoding, errors]). All of its arguments should be byte strings.

A string to convert using the specified encoding.

Optional. The encoding to use when converting the source string. Defaults to 'ascii' (which uses ASCII) if no encoding is specified (which treats characters greater than 127 as errors) [31]:

>>> unicode('abcdef')
>>> type(unicode('abcdef'))
<type 'unicode'>
>>> unicode('abcdef' + chr(255))
Traceback (most recent call last):
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 6:
ordinal not in range(128)

Encodings are specified as strings containing the name of the character encoding. Python 2.7 comes with roughly 100 different encodings; see the Python Library Reference at Standard Encodings for a list. Some encodings have multiple names; for example, latin-1, iso_8859_1 and 8859 are all synonyms for the same encoding. [31]


Optional. Specifies the response when the input string can't be converted according to the rules of the encoding. Legal values for this argument are 'strict' (raise a UnicodeDecodeError exception), 'replace' (add U+FFFD, 'REPLACEMENT CHARACTER'), or 'ignore' (just leave the character out of the Unicode result). [31] Default to 'strict'. The following examples show the differences:

>>> unicode('\x80abc', errors='strict')
Traceback (most recent call last):
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0:
ordinal not in range(128)
>>> unicode('\x80abc', errors='replace')
>>> unicode('\x80abc', errors='ignore')

Under the hood, Python represents Unicode strings as a sequence of Unicode code units. [32] A Unicode code unit is represented by a Unicode object of one item. Code units can be created with the built-in unichr() function, which takes an integer and returns a one-character Unicode string that contains the corresponding code point. [31] The code point is either a 16-bit or 32-bit value representing a Unicode ordinal (the maximum value for the ordinal is given in sys.maxunicode, and depends on how the Python interpreter was compiled). [31] [32] The reverse operation is the built-in ord() function which takes a one-character Unicode string and returns a non-negative integer (the code point value) that represents the Unicode ordinal [31] [32]:

>>> unichr(40960)
>>> ord(u'\ua000')

Conversion from and to other encodings are possible through the Unicode method encode() and the built-in function unicode(). [32]

Instances of the unicode type have many of the same methods as the byte string type for operations such as searching and formatting [31]:

>>> s = u'Was ever feather so lightly blown to and fro as this multitude?'
>>> s.count('e')
>>> s.find('feather')
>>> s.find('bird')
>>> s.replace('feather', 'sand')
u'Was ever sand so lightly blown to and fro as this multitude?'
>>> s.upper()

Note that the arguments to these methods can be Unicode strings or byte strings. Byte strings will be converted to Unicode before carrying out the operation; Python's default ASCII encoding will be used, so characters greater than 127 will cause an exception [31]:

>>> s.find('Was\x9f')
Traceback (most recent call last):
UnicodeDecodeError: 'ascii' codec can't decode byte 0x9f in position 3:
ordinal not in range(128)
>>> s.find(u'Was\x9f')

Much Python code that operates on strings will therefore work with Unicode strings without requiring any changes to the code. (Input and output code needs more updating for Unicode; more on this later.) [31]

A byte string that only contains ASCII character could be upgrade to a Unicode string implicitly. Aside from that Python had one feature that usually confused developers: a byte string for as long as it only contained ASCII characters could be upgraded to a unicode string implicitly. If however it was not ASCII safe it would have caused some form of UnicodeError. Either a UnicodeEncodeErrror or a UnicodeDecodeError depending on when it failed.

Because of all that the rule of thumb on 2.x was this:

The main difference between Python 2 and Python 3 is the basic types that exist to deal with texts and bytes. On Python 3 we have one text type: str which holds Unicode data and two byte types bytes and bytearray. [30]

On the other hand on Python 2 we have two text types: str which for all intents and purposes is limited to ASCII + some undefined data above the 7 bit range, unicode which is equivalent to the Python 3 str type and one byte type bytearray which it inherited from Python 3. [30]

Python 2, like many languages before it, was created without support for dealing with strings of different encodings. A string was a string and it contained bytes. It was up to the developer to properly deal with different encodings manually. This actually works remarkably fine for many situations. The Django framework for many years did not support Unicode at all and used the byte-string interface in Python exclusively. [30]

Here are some situations where the default encoding kicks in:

  1. String concatenation [30]:

    >>> "Hello " + u"World"
    u'Hello World'
    >>> "Hello\xff " + u"World"
    Traceback (most recent call last):
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 5: ordinal not in range(128)

    Basically, this is equivalent to:

    >>> unicode("Hello\xff", 'ascii') + u"World"
  2. Comparison between byte strings and Unicode strings. [30] The byte string will be decoded to Unicode and then compared. In case the left side cannot be decoded, it will warn and return False:

    >>> "foo" == u"foo"
    >>> "foo\xff" == u"foo\xff"
    __main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  3. Implicit decoding as part of a codec [30]:

    >>> "foo".encode('utf-8')
    >>> "\xff".encode('utf-8')
    Traceback (most recent call last):
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)

    Here the string is obviously a byte-string. We ask it to encode to UTF-8. This by itself makes no sense because the UTF-8 codec encodes from Unicode to UTF-8 bytes. So how does this work? It works because the UTF-8 codec sees that the object is not a Unicode string and first performs a coercion to Unicode through the default codec. Since "foo" is ASCII only and the default encoding is ASCII this coercion will succeed and then the resulting u"foo" string will be encoded through UTF-8. [30]

Python 2's biggest problem with unicode was that some APIs did not support it. The most common ones were many filesystem operations, the datetime module, the CSV reader and quite a few interpreter internals. In addition to that a few APIs only ever worked with non Unicode strings or caused a lot of confusion if you introduced Unicode. For instance docstrings break some tools if they are unicode instead of bytestrings, the return value of __repr__ must only ever be bytes and not unicode strings etc. [29]

If you include Unicode directly in a source file, you must declare the encoding to the interpreter at the top of the file:

# -*- coding: utf-8 -*-
print "Ö"

Failing to do so will result in a SyntaxError:

SyntaxError: Non-ASCII character '\xc3' in file on line 1, but no
encoding declared; see for details

Alternatively, you can translate Unicode characters to their code points:

print u"h\u2211llo"

A regular Unicode expression is translated to its code point:

>>> u"Ö"

You can encode it:

>>> u"Ö".encode("utf8")

Encoding in the wrong codec will result in an error:

>>> u"Ö".encode("ascii")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xd6' in position 0: ordinal not in range(128)

Python automatically attempts to encode Unicode strings as ASCII:

>>> with open("test.txt", "w") as f:
...   f.write(u"Ö")
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xd6' in position 0: ordinal not in range(128)

This can be remedied by encoding first or by using

>>> with open("test.txt", "w") as f:
...   f.write(u"Ö".encode("utf8"))
>>> import codecs
>>> with"test.txt", "w", "utf8") as f:
...   f.write(u"Ö")

Python automatically tries to decode byte strings as ASCII:

>>> "Ö".encode("utf8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

This can be fixed by either checking the type or by decoding to UTF-8 first:

>>> "Ö".decode("utf8").encode("utf8")
'\xc3\x96'   Mutable sequences   List - list

List are mutable.

Element access is O(1).   Set types

These represent unordered, finite sets of unique, immutable objects. As such, they cannot be indexed by any subscript. [20] However, they can be iterated over, and the built-in function len() returns the number of items in a set. Common uses for sets are fast membership testing, removing duplicates from a sequence, and computing mathematical operations such as intersection, union, difference, and symmetric difference. [20]

A set is a dictionary without storage for values. So everything that applies to dictionaries also applies to sets.

>>> (A ^ B) == ((A - B) | (B - A))
class Foo(object):
  >>> from breakhash import Foo
  >>> a = set()
  >>> b = Foo(5)
  >>> c = Foo(6)
  >>> b == c
  >>> a.add(b)
  >>> a
  set([<breakhash.Foo object at 0x10ba79710>])
  >>> c in a
  >>> a.add(c)
  >>> a
  set([<breakhash.Foo object at 0x10ba79710>, <breakhash.Foo object at 0x10ba79810>])
  >>> d = Foo(5)
  >>> d in a
  def __init__(self, x):
    self.x = x

  def __hash__(self):
    return 1

  def __eq__(self, other):
    return self.x == other.x   Set - set

These represent a mutable set. They are created by the built-in set() constructor and can be modified afterwards by several methods, such as add(). [20]   Frozen set - frozenset

These represent an immutable set. They are created by the built-in frozenset() constructor. As a frozenset is immutable and hashable, it can be used again as an element of another set, or as a dictionary key. [20]   Mappings

These represent finite sets of objects indexed by arbitrary index sets. The subscript notation a[k] selects the item indexed by k from the mapping a; this can be used in expressions and as the target of assignments or del statements. The built-in function len() returns the number of items in a mapping. [20]   Dictionaries

These represent finite sets of objects indexed by nearly arbitrary values. [20]

  • Dictionary key ordering is based on history.
  • Dicts are initialized with room for 8 items.
  • Hashing looks at last three bits to hash.
  • Key ordering can be different for same keys in different dictionaries but dicts are still equal.
  • Not all lookups are created equal. Some will be expensive depending on number of collisions.
  • When deleting a key, dummy keys are created in place. (Because of lookups and collisions.)
  • Dicts resize when 2/3 full. After resizing, collision rate goes to 0% since more bits are used to hash (and previous hashes become unique).
  • Because of resizing, a dictionary can completely reorder during an otherwise innocent insert.
  • Because an insert can radically reorder a dictionary, key insertion is prohibited during iteration.
  • To use an object as a key, the object must implement __hash__ and __eq__.
  • Equal values (should) have equal hashes regardless of their type.
  • Dictionaries don't shrink when keys are deleted. They resize only on inserts. If you want to reclaim space, copy the items to a new dict.
  • keys must be immutable
  • "in" lookups are O(1)
    • looks at keys, not values
  • besides standard usage, can also be used like dict([(a, b), ..., (x, y)]) (2-tuples)


dict(zip(m.values(), m.keys()))
{v: k for k, v in m.items()}

Default dict:

>>> import itertools, collections
>>> value_to_numeric_map = collections.defaultdict(itertools.count().next)
>>> value_to_numeric_map['a']
>>> value_to_numeric_map['b']
>>> value_to_numeric_map['c']
>>> value_to_numeric_map['a']
>>> value_to_numeric_map['b']

To union two dictionaries, use the following idiom:

dict(l, **r)

Duplicates are resolved in favor the value in r. This works because the constructor for dictionary roughly looks like this: dict(mapping_or_iterable=(), **kwargs).   Classes

Class constructors should define every attribute on the class that will ever be used; methods should not define attributes not defined in the constructor. For example, the following class is considered bad style because A.b() defines B.y, which is not defined in the constructor:

class A(object):
    def __init__(self):
        self.x = 0

    def b(self):
        self.y = 0

Constructors should do so even if the variable is never read, since it indicates to readers what variables they can expect. Constructors are also a good location to document what each variable is for.

2.4   Execution model

2.4.1   Exceptions

  • "When raising an exception, use raise ValueError('message') instead of the older form raise ValueError, 'message'." Must be archaic?

2.5   Expressions

2.5.1   Conditional expressions (Ternaries)

The bad code is awkward here, because while logically x is guaranteed to be defined, syntactically that is not obvious.:

    if b:
        x = u
        x = v
    x = u if b else v

Unfortunately, there is no obvious way to make this clear in the case of dangerous code. For example:

    x = u
except e:
    x = v

2.5.2   Lambdas

lambda is an anonymous function. It is useful for functions which take a key function as an argument.

Guido dislikes lambda

"Why drop lambda? Most Python users are unfamiliar with Lisp or Scheme, so the name is confusing; also, there is a widespread misunderstanding that lambda can do things that a nested function can't -- I still recall Laura Creighton's Aha!-erlebnis after I showed her there was no difference! Even with a better name, I think having the two choices side-by-side just requires programmers to think about making a choice that's irrelevant for their program; not having the choice streamlines the thought process. Also, once map(), filter() and reduce() are gone, there aren't a whole lot of places where you really need to write very short local functions;"

2.6   Grammar

2.6.1   Iterators

Defined in PEP234.

  1. An object can be iterated over with "for" if it implements __iter__() or __getitem__().
  2. An object can function as an iterator if it implements next().

__getitem__ predates the iterator protocol, and was in the past the only way to make things iterable. As such, it's still supported as a method of iterating. Essentially, the protocol for iteration is

  1. Check for an __iter__ method. If it exists, use the new iteration protocol.
  2. Otherwise, try calling __getitem__ with successively larger integer values until it raises IndexError.

Prior to Python 2.2, (2) used to be the only way of doing this, but had the disadvantage that it assumed more than was needed to support just iteration. To support iteration, you had to support random access, which was much more expensive for things like files or network streams where going forwards was easy, but going backwards would require storing everything. __iter__ allowed iteration without random access, but since random access usually allows iteration anyway, and because breaking backward compatability would be bad, __getitem__ is still supported. [35]

Hence, if you write:

class b:
  def __getitem__(self, k):
    return k

x = b()

for i in x:
  print i

You get as output:


2.6.2   Generators

Generators are lazy, memory friendly, scalable. Use yield statement.

def gen_x():
    >>> list(gen_x())
        yield 1

2.6.3   List comprehensions

A list comprehension is a compact way to process all or part of the elements in a sequence and return a list with the results.

For example, say we want to multiply each element in mylist by 2.:

    mylist_copy = []
    for num in mylist:
        mylist_copy.append(num * 2)
    mylist = mylist_copy
    mylist = [2 * x for x in mylist]

List comprehensions are fast and readable (when not nested).

Be aware that list comprehensions pollute scope.:

def foo():
x = True
y = [1 for x in range(10)]
print x

foo()  # prints 9

List comprehensions were borrowed from Haskell.

2.6.4   Decorators

Decorators are syntactic sugar. Basically, just a function that wraps other function:

def printer(f):
    def g(*args)
        print *args
        return f(\*args)
    return h

def f(x):
    return x

The @printer is shorthand for f = printer(f). Calling f(x) first prints x and then returns x.

Decorators can be chained.

2.6.5   Functions

A function declaration consists of four parts in the following order:

  1. Unnamed parameters (Positional parameters)
  2. Named parameters. These are used to provide default values. Named parameters should be immutable to avoid surprising behavior. When a mutable default is necessary, use None and then provide the default in the function body.
  3. Optional infinitary unnamed parameters (conventionally called *args)
  4. Optional infinitary named parameters (conventionally called **kwargs)

For example:

def f(x, y=1, *args, **kwargs):
    print x, y, args, kwargs


In Python 2, callers must fill regular argument slots before the vararg slot. This is not always desirable. One can easily envision a function which takes a variable number of arguments, but also takes one or more options in the form of keyword arguments. Since callers must fill regular argument slots before the vararg slot can be, callers always must fill out options, defeating the purpose of default values and making them equivalent to positional arguments [54]:

def sort_words(case_sensitive=False, *words):
    if not case_sensitive:
        words = [word.lower() for word in words]
    return sorted(words)

sort_words(False, 'eggs', 'Spam')

Currently, the only way to remedy this is to define both a varargs argument, and a keywords argument (**kwargs), and then manually extract the desired keywords from the dictionary [54]:

def sort_words(*words, **kwargs):
    assert not kwargs or kwargs.keys() == ['case_sensitive']
    case_sensitive = kwargs.pop('case_sensitive', False)
    if not case_sensitive:
        words = [word.lower() for word in words]
    return sorted(words)

sort_words('eggs', 'Spam')

Functions are first-class objects.

Functions implicitly return None if no value is returned. This seems to be necessary because without types, an expression like x = f() could possibly be nonsense.

2.7   Simple statements

2.7.2   The assert statement

To express an assertion in Python, we use assert expression ["," expression]). For example:

assert x == 1, "x was not 1"

This is equivalent to:

if __debug__:
    if not x == 1:
        raise AssertionError("x was not 1")

2.7.5   The del statement

del doesn't require parentheses.

2.7.8   The import statement

File can import either via import x or from x import y.

When Python imports a module, it first checks the module registry (sys.modules) to see if the module is already imported. If that’s the case, Python uses the existing module object as is. [47] Otherwise, Python does something like this:

  1. Create a new, empty module object (this is essentially a dictionary)
  2. Insert that module object in the sys.modules dictionary
  3. Load the module code object (if necessary, compile the module first)
  4. Execute the module code object in the new module’s namespace. All variables assigned by the code will be available via the module object.

This means that it's cheap to import an already imported module; Python just has to look the module name up in a dictionary. [47]

If you run a module as a script (i.e. give its name to the interpreter, rather than importing it), it’s loaded under the module name __main__. If you then import the same module from your program, it’s reloaded and reexecuted under its real name. If you’re not careful, you may end up doing things twice. [47]   Relative imports

Relative imports like import ..package.module

Relative imports use the __name__ attribute of a module to determine that module's position in the package hierarchy. If the module's name does not contain any package information (e.g. it is set to '__main__') then relative imports are resolved as if the module were a top level module, regardless of where the module is actually located on the file system.

In short, a module with relative imports cannot be run with a main function AND use relative imports.   Circular imports

Modules are executed during import, and new functions and classes do not appear in the namespace of a module until the def or class statement has been executed. This has some interesting implications if you’re doing recursive imports. Consider a module X which imports module Y and then defines a function called spam:

# module X

import Y

def spam():
    print "function in module x"

If you import X from your main program, Python will load the code for X and execute it. When Python reaches the import Y statement, it loads the code for Y, and starts executing it instead. At this time, Python has installed module objects for both X and Y in sys.modules. But X doesn't contain anything yet; the def spam statement hasn’t been executed. Now, if Y imports X (a recursive import), it will get back a reference to an empty X module object. Any attempt to access X.spam() on the module level will fail:

# module Y

import X

X.spam() # doesn't work:spam isn't defined yet!

To fix this, either refactor the program to avoid circular imports (moving stuff to a separate module often helps), or move the imports to the end of the module (in this case, if you move import Y to the end of module X, everything will work just fine). [47]   Reversing imports

You can delete names from sys.modules to "unimport" things. You can use this to ensure that modules can be imported even if you need to import other modules that might otherwise cause the test to fail:

Tests to ensure Celery tasks can be run without errors.
import sys
import unittest

# This causes side effects to set up the test, but also imports convenince
# functions which would make ``test_can_import`` pass if not cleaned up.
import adroll.dotcom.tests  # noqa

class TestImports(unittest.TestCase):
    def test_can_import(self):
        Ensure backend.tasks.rollcrawl can be imported (i.e. no circular
        import adroll.backend.tasks.rollcrawl  # noqa

    def given_no_adroll_modules_have_been_imported(self):
        Ensure no modules from AdRoll have been imported, which may cause side
        # Python modules are cached in sys.modules, so we can just delete them
        # from the cache to "unimport" them.
        adroll_modules = {module_name for module_name in sys.modules if
        for module in adroll_modules:
            del sys.modules[module]

2.7.9   The pass statement

pass is a null operation -- when it is executed, nothing happens. It is useful as a placeholder when a statement is required syntactically, but no code needs to be executed.

2.7.10   The print statement

print == print >> sys.stdout, line.rstrip()

print does not require parentheses.

The print statement is an abbreviation of sys.stdout.write.

2.7.11   The raise statement

The simplest way to re-throw an exception if you need to perform a little work after the catch is with a simple raise statement [70]:


Here the raise statement means, "throw the exception last caught".

If no expressions are present, raise re-raises the last exception that was active in the current scope.

—The Python Language Reference: Simple statements

However, if you want to raise an exception in a location other than the place that it was caught, it's insufficient to just throw the exception again since it loses traceback information:

class Worker(object):
    def work(self):
            self.result = something_dangerous()
        except Exception as e:
            # traceback from something_dangerious() is lost
            self.e = e

    def get_result(self):
        if self.e:
            # traceback from something_dangerious() is lost
            raise e
            return self.result

The proper way to do this is to use the full three-argument form of the raise statement with the original traceback:

import sys

except Exception:
    exc_info = sys.exc_info()
    # traceback from something_dangerious() is lost
    raise exc_info[1], None, exc_info[2]

The three-argument raise statement is a little odd, owing to its heritage from the old days of Python when exceptions could be things other than instances of subclasses of Exception. This accounts for the odd tuple-dance we do on the saved exc_info. [70]

If the first object is a class, it becomes the type of the exception.

The second object is used to determine the exception value: If it is an instance of the class, the instance becomes the exception value. If the second object is a tuple, it is used as the argument list for the class constructor; if it is None, an empty argument list is used, and any other object is treated as a single argument to the constructor. The instance so created by calling the constructor is used as the exception value.

If a third object is present and not None, it must be a traceback object , and it is substituted instead of the current location as the place where the exception occurred.

—The Python Language Reference: Simple statements

2.8   Compound statements

2.8.2   The for statement

Python's statement would really be better named "for each"; unlike the C programming language or the Java programming language, it iterates over a collection.

One can append an else block to the end of a for block which is only executed if the for block runs without a break.

for i in range(10):
    if i == 10:
        print i
    print "no breaks!"

2.8.3   The try statement

Limit the try clause to the absolute minimum amount of code necessary to avoid masking bugs.

Never use a bare except clause. Doing so risks catching exceptions that should not be caught such as syntax errors.

In addition to the normal try ... except structure, one can append an else block which runs if the try block does not raise an exception. This is useful for isolating the party of the code that can throw an error from the code that cannot.:

def cat(filename):
f = open(filename, "rU")
print "No such file or directory"

[Learn more.](

2.8.4   The while statement

Similar to the for construction:

i = 0
while i < 10:
    if i == 10:
        print i
    i += 1
    print "no breaks!"

2.8.5   The with statement

The with statement is a statement that invokes a context manager.

Python's with statement supports the concept of a runtime context defined by a context manager. This is implemented using two separate methods that allow user defined classes to define a runtime context that is entered before the statement body is executed, and exited when the statement ends. [24]

The with statement was introduced in `PEP 310`_, accepted in PEP 343, and introduced in Python 2.5.

The with statement proposed in PEP 310 had the following syntax:

with (VAR =) EXPR:

which roughly translates into this:


This translation was adequate for acquiring and freeing locks, but it was inadequate for opening and closing files, since it is the file object, rather than the filename, that should be in scope during BLOCK:


f = open(filename)

Consequently, Guido revised the translation in PEP 343 to the more general:

VAR = EXPR.__enter__()

And, since VAR != EXPR, Guido revised the with statement in PEP 343 to:

with EXPR (as VAR):

At a more abstract level, context managers provide a way to abstract the following form [23]:




With more than one item, the context managers are processed as if multiple with statements were nested:

with A() as a, B() as b:

is equivalent to:

with A() as a:
    with B() as b:

2.9   The Python runtime environment

The Python runtime environment consists of two parts: the core library and the standard library.

2.9.1   The core library

The core library is always available.   Built-in functions   enumerate
  • used in conjunction with for loops to where indexing is important
  • for i, item in enumerate(items):   dir

Use dir to see the list of available methods on an object.:

>>> a = "red"
>>> dir(a)
['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__getslice__', '__gt__', '__hash__', '__init__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '_formatter_field_name_split', '_formatter_parser', 'capitalize', 'center', 'count', 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'index', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']

Without arguments, dir() lists the names in local scope:

>>> dir()
['__builtins__', '__doc__', '__name__', '__package__']
>>> a = 2
>>> import math
>>> dir()
['__builtins__', '__doc__', '__name__', '__package__', 'a', 'math']   help

Use help to pull up the docstring for a function or module:

>>> a = "red"
>>> dir(a)
['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__getslice__', '__gt__', '__hash__', '__init__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '_formatter_field_name_split', '_formatter_parser', 'capitalize', 'center', 'count', 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'index', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
>>> help(a.zfill)  # pulls up zfill documentation
Help on built-in function zfill:

S.zfill(width) -> string

Pad a numeric string S with zeros on the left, to fill a field
of the specified width.  The string S is never truncated.   max & min
  • accept a key function to sort on
    • this allows us to find the max value in a dictionary via max(dict, key=lambda key: dict[key])
  • works well with lambda   map & reduce

map is faster than list comprehension but is less readable.

Both are considered unreadable and use is discouraged.

So now reduce(). This is actually the one I've always hated most, because, apart from a few examples involving + or *, almost every time I see a reduce() call with a non-trivial function argument, I need to grab pen and paper to diagram what's actually being fed into that function before I understand what the reduce() is supposed to do. So in my mind, the applicability of reduce() is pretty much limited to associative operators, and in all other cases it's better to write out the accumulation loop explicitly. [10]   range
  • step value must be an integer
    • (to replace just make your own drange function.)
  • almost always should be replaced by xrange (default in python 3)   reload

Use reload to force the interactive interpreter to update a changed module. (Importing again will fail.)   zip
  • takes multiple iterables and generates tuples by popping the first element off each list
  • ie, a=[1, 2, 3] b=[4, 5, 6], c=zip(a, b)=[(1, 4), (2, 5), (3, 6)]
  • almost always should be replaced by itertools.izip

2.9.2   The standard library

Developers must import modules from the standard library as they need them.

PEP2 and PEP4 govern the standard library.

The Python Standard Library contributes significantly to Python's success. The language comes with "batteries included", so it is easy for people to become productive with just the standard library alone. [PEP2]

Many contributions to the library are not created by core developers but by people from the Python community who are experts in their particular field. [PEP2]

The standard library consists of nearly 300 modules. Some of the more important ones are documented below.   __future__

__future__ must be the first import in a file if it is used.

Overrides / operator to be float division, and add news // operator for integer division.   array

Avoid this module. Use lists instead. If you need performance use PyPy or numpy_ arrays   bdb

The bdb module handles basic debugger functions, like setting breakpoints or managing execution via the debugger.   calendar

This module allows you to output calendars like the Unix cal program, and provides additional useful functions related to the calendar.   collections

Author:Raymond Hettinger

New in version 2.4.

  • defaultdict


According to the docs, default values can be implemented by using _replace() to customize a prototype instance:

>>> Account = namedtuple('Account', 'owner balance transaction_count')
>>> default_account = Account('<owner name>', 0.0, 0)
>>> johns_account = default_account._replace(owner='John')

To use default arguments with namedtuple, subclass the result of namedtuple and override __new__:

from collections import namedtuple
class Move(namedtuple('Move', 'piece start to captured promotion')):
    def __new__(cls, piece, start, to, captured=None, promotion=None):
        # add default values
        return super(Move, cls).__new__(cls, piece, start, to, captured, promotion)

Notice we pass cls to both super() and __new__(). This is because __new__() is a static method, not a class method, so it requires an explicit passing of cls.   ConfigParser

For parsing file with a similar structure to what you would find on Microsoft Windows INI files. You can use this to write Python programmers which end users can easily customize.

The configuration file consists of sections, led by a [section] header and followed by name: value entries, with continuations in the style of RFC 822 (see section 3.1.1, “LONG HEADER FIELDS”); name=value is also accepted.

Configuration files may include comments, prefixed by specific characters (# and ;). Comments may appear on their own in an otherwise empty line, or may be entered in lines holding values or section names. In the latter case, they need to be preceded by a whitespace character to be recognized as a comment. (For backwards compatibility, only ; starts an inline comment, while # does not.)

On top of the core functionality, SafeConfigParser supports interpolation. This means values can contain format strings which refer to other values in the same section, or values in a special DEFAULT section.

For example:

[My Section]
foodir: %(dir)s/whatever
long: this value continues
   in the next line

would resolve the %(dir)s to the value of dir (frob in this case). All reference expansions are done on demand.

Use the read() method of SafeConfigParser to read the configuration file:

from ConfigParser import SafeConfigParser

parser = SafeConfigParser()'simple.ini')

print parser.get('bug_tracker', 'url')   cmd


cmd provides a simple framework for writing line-oriented command interpreters.   datetime

Be careful with datetime.fromtimestamp(); when testing on a machine in a different part of the world, it's value will change, which could cause tests to break. Instead, use datetime.utcfromtimestamp() until display.   difflib


The difflib module contains a class, SequenceMatcher, which compares two sequences and computes the changes required to transform one sequence into the other. For example, this module can be used to write a tool similar to the Unix diff program, and in fact the sample program Tools/scripts/ demonstrates how to write such a script. [61]   distutils


The main package for the Python Module Distribution Utilities. Normally used from a setup script as:

from distutils.core import setup

setup (...)   doctest

Author:`Tim Peters`_
Date:March 6, 1999

doctest, is used for testing that docstrings are correct. It works by looking for docstrings which appear like interactive Python sessions, and then executes those sessions to verify that they work as shown:

def divide(x, y):
    Do integer division with x and y.

    If y == 0, then a ZeroDivision error is raised.

    >>> divide(2, 1)
    >>> divide(1, 2)
    >>> divide(10, 0)
    Traceback (most recent call last):
    ZeroDivisionError: integer division or modulo by zero
    return x / y

if __name__ == "__main__":
    import doctest

To run the tests, just execute the file at the command-line:

$ python -v
divide(2, 1)
divide(1, 2)
divide(10, 0)
Traceback (most recent call last):
ZeroDivisionError: integer division or modulo by zero
1 items had no tests:
1 items passed all tests:
3 tests in __main__.divide
3 tests in 2 items.
3 passed and 0 failed.
Test passed.

Since Python 2.6, there is also a command line shortcut for running testmod(). You can instruct the Python interpreter to run the doctest module directly from the standard library and pass the module name(s) on the command line:

python -m doctest -v

doctest is _not_ a replacement for unit testing. However, for programs where correctness is not critical (e.g., gists), doctest is a fine way to increase confidence that code works and that documentation is up-to-date.

doctest uses directives to control whether or not an output is accepted. Some useful ones include NORMALIZE_WHITESPACE which ignores whitespace issues:

>>> print range(20) # doctest: +NORMALIZE_WHITESPACE
[0,   1,  2,  3,  4,  5,  6,  7,  8,  9,
10,  11, 12, 13, 14, 15, 16, 17, 18, 19]

And ELLIPSIS which can match any substring in the actual output:

>>> print range(20) # doctest: +ELLIPSIS
[0, 1, ..., 18, 19]


Decorated functions (which is actually a class instance) do not implicitly get the __doc__ attribute of the original function (which is what doctest parses).

To correct this, you can use functools.wraps:

import functools

def decorator(func):
    def wrapper(*args):
        # ...
        return func(*args)
    return wrapper


Doctest does not play well with Unicode. Doctest seems to covert codepoints to Unicode glyphs, even where the inteprerter would not:

Running doctest on this produces::

    File "", line 7, in
    Failed example:

def foo():
    >>> micro = '\xc2\xb5'
    >>> micro.decode('utf-8')

To fix this, make assertions that evaluate to something besides Unicode characters:

def foo():
    >>> micro = '\xc2\xb5'
    >>> micro.decode('utf-8') == u'\xb5'

Further reading:   email

Author:Barry Warsaw
Maintainers:email package Special Interest Group (SIG)

The email library was added in Python 2.2 (December 21, 2001).

email arose from Warsaw's work on Mailman. [25]

Mailman is free software for managing electronic mail discussion and e-newsletter lists. Mailman is integrated with the web, making it easy for users to manage their accounts and for list owners to administer their lists. Mailman supports built-in archiving, automatic bounce processing, content filtering, digest delivery, spam filters, and more.

email was updated to version 3.0 in Python 2.4. [26]

The email package was updated to version 3.0, which dropped various deprecated APIs and removes support for Python versions earlier than 2.3. The 3.0 version of the package uses a new incremental parser for MIME messages, available in the email.FeedParser module. The new parser doesn’t require reading the entire message into memory, and doesn’t raise exceptions if a message is malformed; instead it records any problems in the defect attribute of the message. (Developed by Anthony Baxter, Barry Warsaw, Thomas Wouters, and others.) [26]

email was updated to version 4.0 in Python 2.5. [27]   fractions

  • Stores a numerator and denominator, allowing for precision when dealing with odd fractions (ie 1 / 7 * 7 = 1, but sum([1 / 7 for i in range(7)]) = 0.9999)
  • Fractions work like any other number, so you can use +, -, *, / as normal   getopt

Deprecated. See argparse.   imaplib

Author:Piers Lauder

imaplib was written before Python 1.5 (1999) and before IMAP4rev1 (2003). For modern development with IMAP, I recommended using IMAPClient.

Useful tutorial:   inspect


inspect is use for advanced introspection.

Get the arguments of a function with:


Get the source code on an object.:

>>> print inspect.getsource(inspect.getsource)
def getsource(object):
    """Return the text of the source code for an object.

    The argument may be a module, class, method, function, traceback, frame,
    or code object.  The source code is returned as a single string.  An
    IOError is raised if the source code cannot be retrieved."""
    lines, lnum = getsourcelines(object)
    return string.join(lines, '')

Find the base classes of an object in method resolution order.:

>>> inspect.getmro(ValueError)
(<type 'exceptions.ValueError'>, <type 'exceptions.StandardError'>, <type 'exceptions.Exception'>, <type 'exceptions.BaseException'>, <type 'object'>)

Find the arguments and keyword arguments of a function.:

>>> pprint.pprint(inspect.getargspec(pprint.pprint))
ArgSpec(args=['object', 'stream', 'indent', 'width', 'depth'], varargs=None, keywords=None, defaults=(None, 1, 80, None))   itertools

Author:Raymond Hettinger
  • product(*args) - return the cartesian products of multiple iterables   logging

Author:Vinay Sajip

The logging module defines a standard API for reporting errors and status information from applications and libraries. The key benefit of having the logging API provided by a standard library module is that all Python modules can participate in logging, so an application’s log can include messages from third-party modules. [43]

This module has an old, Java-like API. Besides the naming convention, there is no way for instance, to declare a formatter in a Handler without using setFormatter.

The simplest way to get started is via basicConfig:

import logging
logger = logging.getLogger(__name__)'Start reading database')

logging.basicConfig(level=logging.INFO) may activate loggers of other libraries that you are using. For example, requests. To turn them off, you can use:


A cleaner approach to replicte the functionality of basicConfig with a single logger might look like:

logger = logging.getLogger(__name__)
_handler = logging.StreamHandler()

A useful feature of the logging API is the ability to produce messages at different levels. This allows developers to log debug messages during development, and hide them in production. [43]

A Handler instance dispatches logging events to specific destinations..

Logs print to standand error by default.

Logging levels
Level Value

The logger, handler, and log message call each specify a level. The log message is only emitted if the handler and logger are configured to emit messages of that level or higher. For example, if a message is CRITICAL, and the logger is set to ERROR, the message is emitted (50 > 40). If a message is a WARNING, and the logger is set to produce only messages set to ERROR, the message is not emitted (30 < 40). [43]   multiprocessing

Author:Jesse Noller

From the docs:

multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine.

The multiprocessing API is identical to the threading API except that its core object is different:

from multiprocessing import Process

def f(name):
    print 'hello', name

if __name__ == '__main__':
    p = Process(target=f, args=('bob',))

It also has a lock which is shared across processes, multiprocessing.Lock.   mutex


The mutex module defines a class that allows mutual-exclusion via acquiring and releasing locks.   new

The new module allows an interface to the interpreter object creation functions.   optparse

Deprecated. See argparse.   os

  • getcwd - gets current working directory   pdb


pdb, like gdb, is Python's interactive debugging tool. It supports setting breakpoints and stepping through code.

To use it, simply insert the following lines in your program like so:

a = "aaa"
import pdb; pdb.set_trace()
b = "bbb"
c = "ccc"
final = a + b + c
print final

Then run your program. When it trips set_trace it will open up an interactive prompt.

  • Enter "n" to go to the next line.
  • Repeat the last command by just hitting Enter.
  • Enter "q" to quit.
  • Enter "p [v, ...]" to display the value of a variable.
  • Enter "c" to let the program continue.
  • Enter "l" to see where you are in the program.
  • Enter "s" to step into a subroutine.
  • Enter "r" to exit a subroutine.
  • Enter "h" for help.

Here is an example session:

$ python
-> b = "bbb"
(Pdb) l
1     a = "aaa"
2     import pdb; pdb.set_trace()
3  -> b = "bbb"
4     c = "ccc"
5     final = a + b + c
6     print final
(Pdb) p a
(Pdb) n
-> c = "ccc"
(Pdb) l
1     a = "aaa"
2     import pdb; pdb.set_trace()
3     b = "bbb"
4  -> c = "ccc"
5     final = a + b + c
6     print final
(Pdb) p c
*** NameError: NameError("name 'c' is not defined",)
(Pdb) c

You can also execute arbitrary Python code while the program is running to modify its behavior. To this, prefix a Python statement with a ! (this makes sure your command isn't interpreted as a pdb command). For instance:

$ python
-> b = "bbb"
(Pdb) n
-> c = "ccc"
(Pdb) p b
(Pdb) !1 + 2
(Pdb) !print b
(Pdb) !b = "BBB"
(Pdb) c
aaaBBBccc   pickle

Use pickle for serialization.

Don't use pickle where json or a database make more sense.   pprint


Use pprint to print data in an intelligible way. It's particularly useful when printing lots of data.:

>>> import pprint
>>> tup = ('spam', ('eggs', ('lumberjack', ('knights', ('ni', ('dead',
... ('parrot', ('fresh fruit',))))))))
>>> stuff = ['a' * 10, tup, ['a' * 30, 'b' * 30], ['c' * 20, 'd' * 20]]
>>> pprint.pprint(stuff)
('knights', ('ni', ('dead', ('parrot', ('fresh fruit',)))))))),
['aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', 'bbbbbbbbbbbbbbbbbbbbbbbbbbbbbb'],
['cccccccccccccccccccc', 'dddddddddddddddddddd']]
>>> pprint.pprint(stuff, depth=3)
('spam', ('eggs', (...))),
['aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', 'bbbbbbbbbbbbbbbbbbbbbbbbbbbbbb'],
['cccccccccccccccccccc', 'dddddddddddddddddddd']]
>>> pprint.pprint(stuff, width=60)
('ni', ('dead', ('parrot', ('fresh fruit',)))))))),
['cccccccccccccccccccc', 'dddddddddddddddddddd']]   profile


cProfile and profile provide deterministic profiling of Python programs.   pydoc

Author:Ka-Ping Yee [61]

Tools/scripts/pydoc, which is automatically installed, uses pydoc to display documentation for a given Python module, package or class name. For example, pydoc os shows help(os):

pydoc - the Python documentation tool

pydoc <name> ...
    Show text documentation on something.  <name> may be the name of a
    Python keyword, topic, function, module, or package, or a dotted
    reference to a class or function within a module or module in a
    package.  If <name> contains a '/', it is used as the path to a
    Python source file to document. If name is 'keywords', 'topics',
    or 'modules', a listing of these things is displayed.

pydoc -k <keyword>
    Search for a keyword in the synopsis lines of all available modules.

pydoc -p <port>
    Start an HTTP server on the given port on the local machine.

pydoc -g
    Pop up a graphical interface for finding and serving documentation.

pydoc -w <name> ...
    Write out the HTML documentation for a module to a file in the current
    directory.  If <name> contains a '/', it is treated as a filename; if

Also try pydoc pydoc at the terminal.

Hitting Shift+K over a word in a Python module, keyword, or built-in, runs pydoc $WORD.   random

  • sample - choose k random elements from n   readline

The readline module defines a number of functions to facilitate completion and reading/writing of history files from the Python interpreter. This module can be used directly or via the rlcompleter module.   string

The string module contains a number of useful constants and classes, as well as some deprecated legacy functions that are also available as methods on strings. In addition, Python’s built-in string classes support the sequence type methods described in the Sequence Types — str, unicode, list, tuple, bytearray, buffer, xrange section, and also the string-specific methods described in the String Methods section. To output formatted strings use template strings or the % operator described in the String Formatting Operations section. Also, see the re module for string functions based on regular expressions.   subprocess

Popen takes a list of strings when called regulary (just like the underlying execve functions). It may seem akward but it saves you the trouble of sanatizing and quoting your arguments correctly.   sys

  • argv - a list of parameters to be used in conjunction   threading

Locks should be used with with statements.


import threading

def worker(a, b):
    print a, b

thread = threading.Thread(target=worker, args=(1, 2))

To use a Queue:

import threading
import Queue

q = Queue.Queue()

def worker():
    while True:
        data = q.get()
        print data


thread = threading.Thread(target=worker)

To run tasks in parallel:

import threading

class Mapper(threading.Thread):
    def __init__(self, func, item):
        self.func = func
        self.item = item
        self.result = None
        super(Mapper, self).__init__()

    def run(self):
        self.result = self.func(self.item)

def thread_map(func, iterable):
    """Run a function in parallel with threads."""
    threads = []
    for item in iterable:
        thread = Mapper(func, item)

    for thread in threads:
        yield thread.result

To use a lock, use threading.Lock:

lock = threading.Lock()
with lock:
    print "hi"   unittest

Author:Steve Purcell [61]

unittest (= PyUnit) is Python's standard unit-test library. It is built to mimic the XUnit series (e.g. JUnit).

unittest was first released in 1999 and has been part of the Python standard library since Python 2.1. [61] [62] PyUnit is used to test Zope, the largest and best-known piece of Python software at the tipe. [62]

Python 2.7 made several changes to unittest.

  • Duplicate way of spelling methods have been deprecated. [60]

    • assert_ -> use assertTrue instead
    • fail* -> use assert* instead
    • assertEquals -> use assertEqual instead
  • unittest is now a package instead of a module.

  • addCleanup and doCleanup were added, which makes tearDown obsolete.

    unittest only calls tearDown if setUp doesn't raise any exceptions:

    If the setup fixture fails, no tests are run and the teardown fixture isn't run, either.

    However, if setUp sets up more than one thing, than tearDown should run anyway. [63] (This is not a problem for pytest since request.addfinalizer behaves the same way as addCleanup.) Purcell argues that if setUp fails it should handle it right there since it's easier than defensively coding in tearDown. [64]


    I ran into some problems with errors in context setup and teardown. Depending on how a test is run, nose is not tearing down contexts properly if there is an error in setup or teardown. If nose is run with a path to the specific test module to run, it tries to set up the context by running the setups from all the packages above it. While iterating through the list of contexts to setup, if one of them fails, the contexts already set up do not get torn down.

    Further reading:   warnings


Warning messages are typically issued in situations where it is useful to alert the user of some condition in a program, where that condition (normally) doesn’t warrant raising an exception and terminating the program. For example, one might want to issue a warning when a program uses an obsolete module.


import warnings

def f():
    warnings.warn("deprecated", DeprecationWarning)

with warnings.catch_warnings():

While withing the context manager, all warnings simply be ignored.

warnings.warn(message[, category[, stacklevel]])
Issue a warning, or maybe ignore it or raise an exception.

2.10   Python virtual machine

The Python Virtual Machine has a number of different implementations. CPython is Python implemented in the C programming language, PyPy is Python implemented in Python, Jython is implemented in Java and runs on the Java VM and IronPython is the Python implementation for Microsoft .NET CLR.


  • python -i will inspect interactively after running script

An environment variable that can be used to augment the default package search paths. Think of it as a PATH variable but specifically for Python. It is simply a list (not a Python list like sys.path) of directories containing the Python modules separated by :. It can be simply set as follows:

export PYTHONPATH=/path/to/some/directory:$PYTHONPATH

An environment variable with the path to a Python file. The Python commands in that file are executed before the first prompt is displayed in interactive mode. The file is executed in the same namespace where interactive commands are executed so that objects defined or imported in it can be used without qualification in the interactive session.


import pprint
import rlcompleter
import readline

sys.displayhook = pprint.pprint
readline.parse_and_bind ("bind ^I rl_complete")

Then in .bash_profile add the line export PYTHONSTARTUP=$HOME/

3   Properties

3.1   Interpreted

Python is an `interpreted language`_ though this property can be blurry because of the presence of the bytecode compiler.

3.2   Documentation

A docstring is a string literal that occurs as the first statement in a module, function, class, or method definition. While ignored when the suite is executed, it is recognized by the compiler and put into the __doc__ attribute of the enclosing class, function or module. Since it is available via introspection, it is the canonical place for documentation of the object. [50]

For example, the Counter docstring. Notice that it mentions what its for and what it subclasses:

Dict subclass for counting hashable items.  Sometimes called a bag
or multiset.  Elements are stored as dictionary keys and their counts
are stored as dictionary values.

For an example of how to document a module see the collections module:

This module implements specialized container datatypes providing alternatives to
Python's general purpose built-in containers, :class:`dict`, :class:`list`,
:class:`set`, and :class:`tuple`.

=====================   ====================================================================  ===========================
:func:`namedtuple`      factory function for creating tuple subclasses with named fields      .. versionadded:: 2.6
:class:`deque`          list-like container with fast appends and pops on either end          .. versionadded:: 2.4
:class:`Counter`        dict subclass for counting hashable objects                           .. versionadded:: 2.7
:class:`OrderedDict`    dict subclass that remembers the order entries were added             .. versionadded:: 2.7
:class:`defaultdict`    dict subclass that calls a factory function to supply missing values  .. versionadded:: 2.5
=====================   ====================================================================  ===========================

In addition to the concrete container classes, the collections module provides
:ref:`abstract base classes <collections-abstract-base-classes>` that can be
used to test whether a class provides a particular interface, for example,
whether it is hashable or a mapping.

There are two forms of docstrings: one-liners and multi-line docstrings. One-liners are for really obvious cases and should fit on one line. A multi-line docstring consists of a summary line just like a one-line docstring, followed by a blank line, followed by a more elaborate description. In either case, docstrings should be surrounded by triple double quotes (""") both for consistency and to make it easier to change forms later. [50]

One-line docstrings should not be a "signature" reiterating the function/method parameters (which can be obtained by introspection), unless they are C functions (such as built-ins), where introspection is not possible:

def function(a, b):
    """function(a, b) -> list"""

However, docstrings should mention the nature of the return value since it cannot be determined by introspection:

def function(a, b):
    """Do X and return a list."""

A docstring is a phrase that prescribes the function or the effect of a method in the imperative mood (i.e. as a command, "Do this.", "Return that."), not as a description; e.g. don't write "Returns the pathname...". [50] To eliminate "Given" and get the active voice in Python:

# bad

def find(number):
    Given a sorted, but possibly shifted, list of numbers, find the index of

# good

def find(number):
    Find the index of *number* in a sorted, but possibly shifted, listed of

References to parameters should be italicized (surrounded by *).

Documentation is conventionally written in reStructuredText.

A blank line should follow docstrings (one-line or multi-line) that document a class -- generally speaking, the class's methods are separated from each other by a single blank line, and the docstring needs to be offset from the first method by a blank line.

The docstring of a script (a stand-alone program) should be usable as its "usage" message, printed when the script is invoked with incorrect or missing arguments (or perhaps with a "-h" option, for "help").

The docstring for a module should generally list the classes, exceptions and functions (and any other objects) that are exported by the module, with a one-line summary of each. (These summaries generally give less detail than the summary line in the object's docstring.) For example, the os module writes:

OS routines for Mac, NT, or Posix depending on what system we're on.

This exports:
  - all functions from posix, nt, os2, or ce, e.g. unlink, stat, etc.
  - os.path is one of the modules posixpath, or ntpath
  - is 'posix', 'nt', 'os2', 'ce' or 'riscos'
  - os.curdir is a string representing the current directory ('.' or ':')
  - os.pardir is a string representing the parent directory ('..' or '::')
  - os.sep is the (or a most common) pathname separator ('/' or ':' or '\\')
  - os.extsep is the extension separator ('.' or '/')
  - os.altsep is the alternate pathname separator (None or '/')
  - os.pathsep is the component separator used in $PATH etc
  - os.linesep is the line separator in text files ('\r' or '\n' or '\r\n')
  - os.defpath is the default search path for executables
  - os.devnull is the file path of the null device ('/dev/null', etc.)

Programs that import and use 'os' stand a better chance of being
portable between different platforms.  Of course, they must then
only use functions that are defined by all platforms (e.g., unlink
and opendir), and leave all pathname manipulation to os.path
(e.g., split and join).

The docstring for a package (i.e., the docstring of the package's module should also list the modules and subpackages exported by the package.

The docstring of a function or method should summarize its behavior and document its arguments, return value(s), side effects, exceptions raised, and restrictions on when it can be called (all if applicable). Optional arguments should be indicated. It should be documented whether keyword arguments are part of the interface.

All modules should normally have docstrings, and all functions and classes exported by a module should also have docstrings. Public methods (including the __init__ constructor) should also have docstrings. A package may be documented in the module docstring of the file in the package directory.

A docstring placed at the top of the module that looks like:


Lorem ipsum.

Will generate the following when help() is called on it:

Help on module cat:

    cat - abc


    Lorem ipsum.

In contrast, a docstring placed at the top of the module that looks like:


Lorem ipsum.

Will generate the following when help() is called on it:

Help on module cat:




    Lorem ipsum.

The first line, assuming there is an empty line afterward, is the name of the module, and everything after is the description.

In any particle module, __doc__ refers to the docstring of the module. Consider two modules which print out __doc__, and one importing the other:


print __doc__


import foo
print __doc__

Executing the second will cause both docstrings to be printed:

$ python

Docstrings for functions should not begin with "Return" since that is verbose; functions inherently return things.

Inside Python object description directives, reST field lists with fields are recognized and formatted nicely:

  • param, parameter, arg, argument, key, keyword: Description of a parameter.
  • type: Type of a parameter. Creates a link if possible.
  • raises, raise, except, exception: That (and when) a specific exception is raised.
  • var, ivar, cvar: Description of a variable.
  • vartype: Type of a variable. Creates a link if possible.
  • returns, return: Description of the return value.
  • rtype: Return type. Creates a link if possible.

For example:

def send_message(sender, recipient, message_body, [priority=1])
   Send a message to a recipient

   :param str sender: The person sending the message
   :param str recipient: The recipient of the message
   :param str message_body: The body of the message
   :param priority: The priority of the message, can be a number 1-5
   :type priority: integer or None
   :return: the message id
   :rtype: int
   :raises ValueError: if the message_body exceeds 160 characters
   :raises TypeError: if the message_body is not a basestring

It is also possible to combine parameter type and description, if the type is a single word, like this:

:param int priority: The priority of the message, can be a number 1-5

If in Sphinx I have a reference like :class:`.mapper`, and then later mapper() becomes a function that calls to Mapper, now all the :class:`.mapper` links are broken.

For functions which are similar:

Like dict.update() but subtracts counts instead of replacing them.

3.3   Interfaces

Implies the creation of "weak" interfaces

  • objects that implement an "interface" should just implement each of the methods in the interface -- no need to be explicit
  • also consider abstract base classes (abcs)

Python has no syntax to implement interfaces. An attempt was made in 2001 with PEP 245 but after five years of flirting with integrating static types, Guido rejected it.

There is a need for interfaces. Python's answer is the "protocol", which is just an announcement that if you're object implements a set of methods with specified type signatures, it will work. This is conventionally called "duck-typing" from the phrase, "If it looks like a duck, walks like a duck, and quacks like a duck, it's probably a duck."

In Python, functions should accept whatever type contains the information needed.

Python's protocols are mostly implemented through double-underscore ("dunder") methods. They are complicated and not worth discussing here.

3.4   Performance

Compared to other languages, Python is relatively slow.

Python favors speed of development over speed of code. Focus on acceptable speed rather than strict optimization.

Function overhead is relatively high. Use x ** 2 rather than math.pow(x, 2).

Many core building blocks are coded in optimized C. Applications that take advantage of them can make substantial performance gains. The building blocks include all of the builtin datatypes (lists, tuples, sets, and dictionaries) and extension modules like array, itertools, and collections.deque.

Likewise, the builtin functions run faster than hand-built equivalents. For example, map(operator.add, v1, v2) is faster than map(lambda x,y: x+y, v1, v2).

Lists perform well as either fixed length arrays or variable length stacks. However, for queue applications using pop(0) or insert(0,v)), collections.deque() offers superior O(1) performance because it avoids the O(n) step of rebuilding a full list for each insertion or deletion.

  • Catching exceptions in Python is relatively cheap. (evidence?)
  • Remove dots. (why?)
  • while 1 is faster than while True. (why?)

3.4.1   Recursion

Python sets a maximum limit on the interpreter stack to prevent overflows on the C stack that might crash Python. This limit is platform dependent. Increasing the limit may lead to a crash.

import sys

assert sys.getrecursionlimit() == 1000

# NOTE: This is 996 because the stack has to actually call this
# If it were outside of this test, 999 would work
assert_raises(None, lambda: fact(996))
assert_raises(RuntimeError, lambda: fact(1000))

A commonly proposed solution to Python's recursion problem is tail call optimization. Guido has rejected this proposal for a number of reasons...

3.5   Concurrency

3.5.1   The Global Interpreter Lock

The Python interpreter is not fully thread-safe. In order to support multi-threaded Python programs, there’s a global lock, called the global interpreter lock or GIL, that must be held by the current thread before it can safely access Python objects. Without the lock, even the simplest operations could cause problems in a multi-threaded program: for example, when two threads simultaneously increment the reference count of the same object, the reference count could end up being incremented only once instead of twice. [41]

Therefore, the rule exists that only the thread that has acquired the GIL may operate on Python objects or call Python/C API functions. In order to emulate concurrency of execution, the interpreter regularly tries to switch threads (see sys.setcheckinterval()). The lock is also released around potentially blocking I/O operations like reading or writing a file, so that other Python threads can run in the meantime. [41]

Python always uses exactly one thread, the Global Interpreter Lock (the GIL). Threading is faked, with the GIL controlling everything

4   Rhetoric

Code that adheres to Python's style and idioms_ is called "Pythonic".

The Python language actively encourages a large number of idioms to accomplish a number of tasks ("the one way to do it").


flatten = lambda *x: sum(x, [])

To divide a list into parts of equal length:

zip(*[iter([1, 2, 3, 4, 5, 6, 7])]*2) == [(1, 2), (3, 4), (5, 6)]

If you need to use a sentinel object, use:

sentinel = object()

Comparisons use id, so both is and == work to identify the sentinel. Flask uses this in flask.ctx.

4.1   Style

You should care about style because...

Style is standardized in PEP 8.

A linter can be used to enforce good style and do simple static analysis. I recommended using flake8, which combines two other linters, pep8, and pyflakes.


Always compare variables with constants; never compare constants with variables:

# Bad
if 'md5' == method:

# Good
if method == 'md5':

On negated containment checks:

# Bad
not foo in bar

# Good
foo not in bar

Align to the braces if you break a statement with braces:

# Bad
this_is_a_very_long(function_call, 'with many parameters',
    23, 42, 'and even more')

# Good
this_is_a_very_long(function_call, 'with many parameters',
                    23, 42, 'and even more')

Break immediately after the opening brace for containers with many items:

items = [
    'this is the first', 'set of items', 'with more items',
    'to come in this line', 'like this'

Top level functions and classes are separated by two lines, everything else by one. Do not use too many blank lines to separate logical segments in code.

If a comment is used to document an attribute, put a colon after the opening pound sign (#):

class User(object):
    #: the name of the user as unicode string
    name = Column(String)
    #: the sha1 hash of the password + inline salt
    pw_hash = Column(String)

4.1.1   Naming

  • underscore naming scheme, e.g my_variable
  • camel-case for classes, e.g. MyClass
  • all-caps for globals (or constants??)
  • A single-trailing underscore is used to avoid name conflicts.


You should avoid using any names which pdb uses, e.g. "n", "r", since you will not be able to access them from inside pdb if you do. A simple way to (mostly) avoid this is to use names that are at least two characters long.

4.1.2   Comparison to booleans

It has been suggested that, in order to satisfy user expectations, for every x that is considered true in a Boolean context, the expression x == True should be true, and likewise if x is considered false, x == False should be true. In particular newbies who have only just learned about Boolean variables are likely to write

if x == True: ...

instead of the correct form,

if x: ...

There seem to be strong psychological and linguistic reasons why many people are at first uncomfortable with the latter form, but I believe that the solution should be in education rather than in crippling the language. After all, == is general seen as a transitive operator, meaning that from a==b and b==c we can deduce a==c. But if any comparison to True were to report equality when the other operand was a true value of any type, atrocities like 6==True==7 would hold true, from which one could infer the falsehood 6==7. That's unacceptable.

Newbies should also be reminded that there's never a reason to write

if bool(x): ...

since the bool is implicit in the "if". Explicit is not better than implicit here, since the added verbiage impairs readability and there's no other interpretation possible.

There is, however, sometimes a reason to write

b = bool(x)

This is useful when it is unattractive to keep a reference to an arbitrary object x, or when normalization is required for some other reason.

—Guido van Rossum, PEP 285

PEP8 explicitly says:

For sequences, (strings, lists, tuples), use the fact that empty sequences are false.

5   IO

Use fd = open(filename) to open a file object. Make sure to call fd.close() after use since otherwise garbage collection will not happen until the end of the program. A simple way to close files after use is to use open() as a context manager.

You should use 'rU' for reading because...

6   Static analysis

To correct a from module import * in a module that already has many errors which you don't want to correct, you can use the --select=F option to select only PyFlakes errors:

flake8 adroll/dotcom/controllers/backstage/ --select=F

7   Paradigms

7.1   Object-oriented programming

All data in a Python program is represented by objects or by relations between objects. [32]

Every object has an identity, a type, and a value. [32]

An object’s identity never changes once it has been created; you may think of it as the object’s address in memory. The is operator compares the identity of two objects; the id() function returns an integer representing its identity (currently implemented as its address). [32]

An object’s type is also unchangeable. An object’s type determines the operations that the object supports (e.g., “does it have a length?”) and also defines the possible values for objects of that type. The type() function returns an object's type (which is an object itself). [32]

The value of some objects can change. Objects whose value can change are said to be mutable; objects whose value is unchangeable once they are created are called immutable. An object’s mutability is determined by its type; for instance, numbers, strings and tuples are immutable, while dictionaries and lists are mutable. [32]

Objects are never explicitly destroyed; however, when they become unreachable they may be garbage-collected. [32]

Some objects contain references to “external” resources such as open files or windows. It is understood that these resources are freed when the object is garbage-collected, but since garbage collection is not guaranteed to happen, such objects also provide an explicit way to release the external resource, usually a close() method. Programs are strongly recommended to explicitly close such objects. [32]

Some objects contain references to other objects; these are called containers. Examples of containers are tuples, lists and dictionaries. The references are part of a container’s value. In most cases, when we talk about the value of a container, we imply the values, not the identities of the contained objects; however, when we talk about the mutability of a container, only the identities of the immediately contained objects are implied. So, if an immutable container (like a tuple) contains a reference to a mutable object, its value changes if that mutable object is changed. [32]

7.1.1   Classes

A class is a user defined type.

All methods must take self as an explicit argument. [explicit over implicit]

  • class methods

Classes should always inherit from object (or other class). This is known as a "new-style class" and is required for random things to work.

7.1.2   Properties & descriptors

A descriptor is ...

A property is ...

  • used to replace getter/setter methods
  • ie, replaces awkward methods like self.get_first_name() with just self.first_name
  • "property" on foo creates a getter for foo
  • "foo.setter" is then used to add a setter
  • can be inherited, extended, supered, etc, like any function

I think descriptors are basically useless now given the syntactic sugar of properties

7.1.3   Bound and unbound methods

TODO: Read

In Python, there is a distinction between bound and unbound methods. A call to a bound function like is translated to a call to an unbound method You can use staticmethod() to tell the built-in default metaclass, type, to not create a bound methods for the decorated function.

Imagine the following class:

>>> class C(object):
>>>     def foo(self):
>>>         pass

You get back an unbound method if you access the foo attribute on the class:

<unbound method>

However, inside the class storage there is a function:

>>> C.__dict__['foo']
<function foo at 0x...>

The class of your class implements a __getattribute__ that resolves descriptors. Thus, is roughly equivalent to [71]:

>>> C.__dict__['foo'].__get__(None, C)
<unbound method

Functions have a __get__ method which makes them descriptors. If you have an instance of class, a class instance replaces None [71]:

>>> C.__dict__['foo'].__get__(c, C)
<bound method of <__main__.C object at 0x...>>

Python does this because the method object binds the first parameter of a function to the instance of the class. That's where the self comes from. If you don't want your class to make a function a method, use staticmethod, which implements a dummy __get__ that returns the wrapped function as a function and not as a method [71]:

>>> class C(object):
>>>     @staticmethod
>>>     def foo():
>>>         pass
>>> C.__dict__['foo'].__get__(None, C)
<function foo at 0x...>

Bound methods of an unhashable object are also unhashable:

>>> d = {}
>>> d.clear.__self__
>>> d.clear.__self__ is d
>>> hash(d.clear)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'dict'
>>> hash(dict.clear)

7.1.4   Magic Methods

See:   __del__

This should not be overwritten because ...   __new__

This can be overwritten to subclass immutable types.   __subclasses__

__subclasses__ is affected by imports, which means they have side effects. Consider the following modules:


class A(object):


from a import A

class B(A):
>>> from a import A
>>> A.__subclasses__()
>>> from b import B
>>> A.__subclasses__()
[<class 'b.B'>]

7.1.5   Inheritance

super() is a Python built-in, first introduced in python 2.2, and slightly improved and fix in later versions. [68] super() is widely misunderstood, partly because super was poorly and incorrectly documented until Python 2.6. [68]

super() does not return the superclass. In fact, there is no such thing as "the" superclass in a multiple inheritance (MI) world. [68] To see this, consider the following inheritance tree:

class T(object):
    a = 0

class A(T):

class B(T):
    a = 2

class C(A, B):

Most people would probably say the superclass of C is A since it comes before B, however if super(C, C) was returning the superclass of C then super(C, c).a should be 0. Instead, super(C, c) walks through the MRO of C which is [C, A, B, T, object] so super(C, C).a evaluates to 2. [68] Using the word "superclass" in documentation should be avoided altogether.

super(type) is a proxy object that delegates method calls to a parent or sibling class of type. This is useful for accessing inherited methods that have been overridden in a class. [65]

super is a class overriding the __getattribute__ method. [68] Instances of super are proxy objects providing access to the methods in the MRO.

One big problem with 'super' is that it sounds like it will cause the superclass's copy of the method to be called. This is simply not the case, it causes the next method in the MRO to be called. [67]

That misconception causes people to mistakenly omit calls to super(...).__init__ if the only superclass is 'object', as, after all, object.__init__ doesn't do anything! However, this is very incorrect. Doing so will cause other classes' __init__ methods to not be called. [67]

There are two typical use cases for super. In a class hierarchy with single inheritance, super can be used to refer to parent classes without naming them explicitly, thus making the code more maintainable. This use closely parallels the use of super in other programming languages. [65]

The second use case is to support cooperative multiple inheritance in a dynamic execution environment. This use case is unique to Python and is not found in statically compiled languages or languages that only support single inheritance. This makes it possible to implement “diamond diagrams” where multiple base classes implement the same method. Good design dictates that this method have the same calling signature in every case (because the order of calls is determined at runtime, because that order adapts to changes in the class hierarchy, and because that order can include sibling classes that are unknown prior to runtime). [65]

Before super() was introduced, we would have hardwired the call with dict.__setitem__(self, key, value). However, super() is better because it is a computed indirect reference. [8]

One benefit of indirection is that we don’t have to specify the delegate class by name. If you edit the source code to switch the base class to some other mapping, the super() reference will automatically follow. [8]

Further, since the indirection is computed at runtime, we have the freedom to influence the calculation so that the indirection will point to some other class. [8]

The calculation depends on both the class where super is called and on the instance’s tree of ancestors. The first component, the class where super is called, is determined by the source code for that class. In our example, super() is called in the LoggingDict.__setitem__ method. That component is fixed. The second and more interesting component is variable (we can create new subclasses with a rich tree of ancestors).

Let’s use this to our advantage to construct a logging ordered dictionary without modifying our existing classes:

class LoggingOD(LoggingDict, collections.OrderedDict):

The ancestor tree for our new class is: LoggingOD, LoggingDict, OrderedDict, dict, object. For our purposes, the important result is that OrderedDict was inserted after LoggingDict and before dict! This means that the super() call in LoggingDict.__setitem__ now dispatches the key/value update to OrderedDict instead of dict.

Think about that for a moment. We did not alter the source code for LoggingDict. Instead we built a subclass whose only logic is to compose two existing classes and control their search order.

What I’ve been calling the search order or ancestor tree is officially known as the Method Resolution Order or MRO.

Methods called by super() must exist; attributes accessed dynamically (e.g. with __getattr__) will not work:

>>> class Parent(object):
...    def __getattr__(self, k):
...        return 1
...    def __len__(self):
...        return 1
>>> class Child(Parent):
...    def foo(self):
...        return super(Child, self).foo()
...    def len_1(self):
...        return super(Child, self).__len__()
...    def len_2(self):
...        return len(super(Child, self))
>>> child = Child()
>>> print
Traceback (most recent call last):
  File "", line 13, in <module>
  File "", line 10, in foo
    return super(Child, self).foo()
AttributeError: 'super' object has no attribute 'foo'
>>> child.len_1()
>>> child.len_2()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "", line 13, in bar
    return len(super(Child, self))
TypeError: object of type 'super' has no len()

super(cls,self)[index] is not equivalent to super(cls,self).__getitem__(index) ``. ``super objects are attribute descriptions. [65]

In you case 'self' is a string and has a __getitem__ method; nevertheless, since super(cls,self) is a descriptor and not a string, I would not expect it to have a __getitem__ method, nor the other methods of a string. [65]

7.1.6   Method resolution order

The Method Resolution Order (MRO) is the order in which base classes are searched for a member during lookup. [16] The process of solving those constraints is known as "linearization". There are a number of good papers on the subject, but to create subclasses with an MRO to our liking, we only need to know the two constraints: children precede their parents and the order of appearance in __bases__ is respected. [8] Therefore, given the following module:

class A(object):
    def x(self):
        print "A"

class B1(A):
    def x(self):
        super(B1, self).x()
        print "B1"

class B2(A):
    def x(self):
        super(B2, self).x()
        print "B2"

class B3(A):
    def x(self):
        super(B3, self).x()
        print "B3"

class C(B1, B2, B3):
    def x(self):
        super(C, self).x()
        print "C"

Then we get:

>>> C().x()

However, if we delete the call to super() in B2, and rerun it, we get:

>>> C().x()

Therefore, every class in the (MRO) must call super() to ensure that every class in the hierarchy gets a chance to run its logic.

This behavior occurs if you arrange mixins which override methods in the wrong order. For example, in the following code MyMixin.setUp will never be invoked since the MRO is MyTestCase > unittest.TestCase > MyMixin and unittest.TestCase, as subclass of object, does not call super() in setUp since it is a based method implementation; there is no object.setUp():

class MyMixin(object):
    def setUp(self):
        super(MyMixin, self).setup()
        # ...

class MyTestCase(unittest.TestCase, MyMixin):

In any any `C3 class system`_, every class declaration should be read as a single partial order; class A(B, C) should be understood to mean that A < B < C; that A is a subclass of B, and B is a subclass of C. Thus a declaration like the one above should be suspicious, since MyMixin depends on the setUp method provided by unittest.TestCase. [69]

Piet Delport argues that mixins should subclass whatever classes they depend on in order to ensure that they take care of their own dependencies. Ned Batchelder argues the opposite, that because mixins are not conceptually subclasses of their dependencies, they should not be implemented as so, and that is it is okay to call super() in mixins since we know that they are only ever supposed to be used with their dependencies. [69] Who is right remains unclear to me; both solutions are functional.

Classes that inherit from multiple classes that define the same name use the name defined on the left most inherited class:

>>> class Left(object):
...     def trigger(self):
...         return "Left"
>>> class Right(object):
...     def trigger(self):
...         return "Right"
>>> class LeftRight(Left, Right):
...     pass
>>> class RightLeft(Right, Left):
...     pass
>>> LeftRight().trigger()
>>> RightLeft().trigger()

Therefore, in cases where a naming conflict exists between two parent classes, the intended parent class should appear on the left.

For more information on this topic, see "The Python 2.23 Method Resolution Order" by Michele Simionato.

7.1.7   Metaclasses

A metaclass is a class whose instances are classes. M is a metaclass of C if: 1) M is a class; 2) C is a class; 3) type(C) is M. [44]

A metaobject is an object that manipulates, creates, describes, or implements other objects (including itself). The object that the metaobject is about is called the base object. Some information that a metaobject might store is the base object's type, interface, class, methods, attributes, parse tree, etc. Metaobjects are examples of the computer science concept of reflection, where a system has access (usually at run time) to its internal structure.

The first metaobject protocol was in the Smalltalk object-oriented programming language developed at Xerox PARC. The Common Lisp Object System (CLOS) came later and was influenced by the Smalltalk protocol. The CLOS model, unlike the Smalltalk model, allowed a class to have more than one superclass. This provides additional complexity in issues such as resolving which class has responsibility for handling messages defined on two different superclasses. One of the most influential books describing the metaobject protocol in CLOS was The Art of the Metaobject Protocol by Gregor Kiczales.

A metaobject protocol is one way to implement aspect-oriented programming languages. Many of the early founders of MOPs, including Gregor Kiczales have since moved on to be the primary advocates for aspect-oriented programming.

In Python, the builtin class type is a metaclass.

A metaclass is a subclass of the builtin metaclass type. [44] For example, type (because its superclass is type) and the following:

class DoNothingMeta(type):

A metaclass can be instantiated by specifying a string (the name), a tuple of classes (the bases) and a dictionary (the dic) [44]:

C = DoNothingMeta('C', (object,), {})

The metaclass instance is a class named C, with parent object and an empty dictionary. In Python 2.2+ you have at your disposition the so-called __metaclass__ hook and that the previous line can be written also as [44]:

class C(object):
    __metaclass__ = DoNothingMeta

Classes are used to model sets of objects; in the same sense, metaclasses are used to model sets of classes. [44]

In Python (and other languages), classes are themselves objects that can be passed around and introspected. Just as regular classes act as templates for producing instances, metaclasses act as templates for producing classes.

Python has always had metaclasses. The metaclass machinery became exposed much better with Python 2.2. Specifically, with version 2.2, Python stopped being a language with just one special (mostly hidden) metaclass that created every class object. Now, programmers can subclass the built-in metaclass type and even dynamically generate classes with varying metaclasses.

So far, we have seen the basics of metaclasses. Putting them to work is more subtle. The challenge of using metaclasses is that in typical OOP design, classes do not really do much. Class inheritance structures encapsulate and package data and methods, but one typically works with instances in the concrete.

Methods (i.e., of classes), like plain functions, can return objects. In that sense, it is obvious that class factories can be classes just as easily as they can be functions. In particular, Python 2.2+ provides a special class called type that is just such a class factory.

There is one feature of type descendents to be careful about; it catches everyone who first plays with metaclasses. The first argument to methods is conventionally called cls rather than self, because the methods operate on the produced class, not the metaclass. Actually, there is nothing special about this. All methods attach to their instances, and the instance of a metaclass is a class. A better name makes this more obvious:

>>> class Printable(type):
...     def whoami(cls): print "I am a", cls.__name__
>>> Foo = Printable('Foo',(),{})
>>> Foo.whoami()
I am a Foo
>>> Printable.whoami()
Traceback (most recent call last):
TypeError:  unbound method whoami() [...]

There are two general categories of programming tasks where I think metaclasses are genuinely valuable.

The first, and probably more common, category is where you do not know at design time exactly what a class needs to do. Obviously, you will have some idea about it, but some particular detail might depend on information that will not be available until later. "Later" itself can be of two sorts: a), when a library module is used by an application, and b), at runtime when some situation exists. This category is close to what is often called "Aspect Oriented Programming" (AOP). [45] Let me show an elegant example:

The construct:

class MyClass(object):

is identical to the construct:

MyClass = type('MyClass', (), {})

MyClass is an instance of type type, and that can be seen explicitly in the second version of the definition.

The arguments to the type constructor are:

name is a string giving the name of the class to be constructed bases is a tuple giving the parent classes of the class to be constructed dct is a dictionary of the attributes and methods of the class to be constructed

The following:

Interface = InterfaceMeta('Interface', (), dict(file='tmp.txt'))


is identical to:

class Interface(object):
    __metaclass__ = InterfaceMeta
    file = 'tmp.txt'


By defining the __metaclass__ attribute of the class, we've told the class that it should be constructed using InterfaceMeta rather than using type. To make this more definite, observe that the type of Interface is now InterfaceMeta:

>>> type(Interface)

The Django project makes use of these sorts of constructions to allow concise declarations of very powerful extensions to their basic classes.

8   Philosophy

The philosophy of Python was concisely summed up by Tim Peters in The Zen of Python (PEP 20) (accessible via import this):

Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Spare is better than dense. Readability counts. Special cases aren't special enough to break the rules. Errors should never pass silently. In the face of ambiguity, refuse the temptation to guess. There should be one -- and preferably only one -- obvious way to do it. Although that way not be obvious at first unless you're Dutch. Now is better than never. Although never is often better than right now. If the implementation is hard to explain, it's a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- let's do more of those!

Other mottos:

8.1   Syntax

len() is not a method because van Rossum translated ABC's #x notation and because he likes the way it looks better. [76] van Rossum also argues that for some operations, prefix notations reads better than postfix, and that len() defines a sort of soft interface which guarantees the result will be an integer. (This is especially problematic in duck-typed language when two classes share the same method name but not the same semantics.) [77]

8.2   Abbrevations in names

Some reviewers have argued for boolean instead of bool, because this would be easier to understand (novices may have heard of Boolean algebra but may not make the connection with bool) or because they hate abbreviations. My take: Python uses abbreviations judiciously (like 'def', 'int', 'dict') and I don't think these are a burden to understanding. To a newbie, it doesn't matter whether it's called a waffle or a bool; it's a new word, and they learn quickly what it means.

—Guido van Rossum, PEP 285

8.3   Exception handling

Python's exception handling philosophy is `Easier to Ask Forgivness than Permission`_. This philosophy makes it harder to do the "wrong" thing (e.g. failing to check the return value of some system call). [PEP310]

For resource cleanup, the original way to do this was via:


This syntax separates the acquisition and release by a block of code, which makes it difficult to confirm "at a glance" that the code manages the resource correctly. [PEP310]

Another common error is to code the "acquire" call within the try block, which incorrectly releases the block if the acquire fails. [PEP310]

Using try-except statements has another advantage; it avoid race conditions. [22] Consider the following:

except OSError:

Another reasons for throwing exceptions rather than returning None is that exceptions should fail as soon as possible, and they fail earlier if you throw them and don't catch them.

Without using exceptions for control flow, we could not do something like:

    '(': ')',
    '{': '}',
    '[': ']',

def brackets(chars):
    >>> brackets('[]')
    >>> brackets('[')
    >>> brackets(']')
    >>> brackets('[[]')
    >>> brackets('[]]')
    >>> brackets('[({}[])]')
    >>> brackets('[(])')
    stack = []
    for char in chars:
        if char in BRACKETS:
                if BRACKETS[stack.pop()] != char:
                    return False
            except IndexError:
                return False
    return not stack

Instead we would have to rewrite this as, which introduces a new name:

    '(': ')',
    '{': '}',
    '[': ']',

def brackets(chars):
    >>> brackets('[]')
    >>> brackets('[')
    >>> brackets(']')
    >>> brackets('[[]')
    >>> brackets('[]]')
    >>> brackets('[({}[])]')
    >>> brackets('[(])')
    stack = []
    for char in chars:
        if char in BRACKETS:
        elif len(stack) > 0:
            l_char = stack.pop()
            if BRACKETS[l_char] != char:
                return False
            return False
    return not stack

Exceptions force the programmer to handle exceptional cases correctly. Consider the SQLAlchemy query methods Query.first() and first() returns either None or the first object which satisfies some query. one() returns a single object which satisfies some query, and raises an exception if no object or more than object satisfy that query. The two have slightly different semantics and are not completely substitutable, however it is important to note that if you expect exactly one or zero results you should use one() instead of first(), since using first() without checking its status will let control flow normally until an error occurs later in the code.

8.4   Modules & Packages

A module is a block of code imported by some other code; the basic unit of code reusability in Python. There are two kinds of modules: pure modules and extensions modules. Pure modules are module that are contained in a single .py file. An extension module is a module written in the language of Python implementation: C/C++ for Python, Java for Jython. [39] [40]

8.4.1   Packages

A package is a module that contains other modules. [39] Python considers any directory that contains a file named a Python package. [39]

The file is usually empty, but can be used to export selected portions of the package under more convenient names, hold convenience functions, execute initialization code for the package (for example logging), or set the __all__ variable.

The vast majority of the files I write are empty, because many packages don't have anything to initialize.

One example in which I may want initialization is when at package-load time I want to read in a bunch of data once and for all (from files, a DB, or the web, say) -- in which case it's much nicer to put that reading in a private function in the package's rather than have a separate "initialization module" and redundantly import that module from every single real module in the package (uselessly repetitive and error-prone: that's obviously a case in which relying on the language's guarantee that the package's is loaded once before any module in the package is obviously much more Pythonic!).

—Alex Martelli

How much code should I be throwing in a Python module?

Think in terms of a "logical unit of packaging" -- which may be a single class, but more often will be a set of classes that closely cooperate. Classes (or module-level functions -- don't "do Java in Python" by always using static methods when module-level functions are also available as a choice!-) can be grouped based on this criterion. Basically, if most users of A also need B and vice versa, A and B should probably be in the same module; but if many users will only need one of them and not the other, then they should probably be in distinct modules (perhaps in the same package, i.e., directory with an file in it).

The standard Python library, while far from perfect, tends to reflect (mostly) reasonably good practices -- so you can mostly learn from it by example. E.g., the threading module of course defines a Thread class... but it also holds the synchronization-primitive classes such as locks, events, conditions, and semaphores, and an exception-class that can be raised by threading operations (and a few more things). It's at the upper bound of reasonable size (800 lines including whitespace and docstrings), and some crucial thread-related functionality such as Queue has been placed in a separate module, nevertheless it's a good example of what maximum amount of functionality it still makes sense to pack into a single module.

—Alex Martelli

My recommendation, per the Google's Python style guide, is to only ever import modules, not classes or functions (or other names) from within modules. Strictly following this makes for clarity and precision, and avoids subtle traps that may come when you import "stuff from within a module".

—Alex Martelli,

A file in the directory pack/ is imported with the statement import pack.modu. This statement will look for an file in pack, execute all of its top-level statements. Then it will look for a file named pack/ and execute all of its top-level statements. After these operations, any variable, function, or class defined in is available in the pack.modu namespace.


You should never name a module a name which is already used by the standard library to prevent name conflicts. For example, if you name a module in a application using the Flask web framework, you may get an import error like this:

$ python
Traceback (most recent call last):
  File "", line 9, in <module>
    import werkzeug
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/werkzeug/", line 154, in <module>
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/werkzeug/", line 67, in <module>
    from werkzeug._internal import _get_environ
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/werkzeug/", line 13, in <module>
    import inspect
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/", line 39, in <module>
    import tokenize
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/", line 31, in <module>
    from token import *
  File "/Users/jessicastewart/Desktop/token_app/", line 9, in <module>
    from flask import Flask, render_template, request
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/flask/", line 17, in <module>
    from werkzeug.exceptions import abort
ImportError: cannot import name abort


The file of a package should not import (for its own use or to export) the names of any of its submodules, since Python will bind the name of its submodules to the namespace when a submodule is loaded using any mechanism. For example consider a package testapp with the following structure:

├── api
│   ├──
│   └──

If testapp.api.__init__ imports the testapp.utils as utils and then imports testapp.api.utils, testapp.api.utils will refer to the testapp.api.utils module instead of the testapp.utils module.

If only importing functions, you can fix this simply by importing fully qualified modules.

This problem can be more troublesome when submodules share the name of a function the parent module wants to export, as might be the case when decomposing a monolithic module into submodules for each of its primary functions.

8.5   Namespaces

Namespaces are not the same as modules. I suspect modules came later.

Namespaces are an interesting feature. In C, everything is just a giant global / local namespace, same with Icon, probably that whole generation. Python is the same way, except that imports do not just concatenate files, but prefix them with names, greatly reducing the chance of name collisions.

  • builtins
  • globals
  • lexically enclosed
  • locals

8.5.1   Name binding

vars()['x'] = 1
print x # 1
import sys

def f():
    frame = sys.getframe(1)
    frame.f_locals['x'] = 1

print x # 1
  • Evil use of list comprehensions (does not work in python 3)
  • import *

8.6   Scripting

Scripts in Python should have a structure like this [52]:

"""Module docstring

Long usage message.
import argparse
import sys

def main(args):
    parser = argparse.ArgumentParser()
    return 0

def foo(a=None):

if __name__ == "__main__":

This builds off of Guido's approach in "Python main() functions" (2003) in that main() takes argv and returns an exit code (rather than invoking sys.exit directly) which allows us to call it from the interactive Python prompt, but differs in that it uses argparse instead of getopt, which has since been deprecated. [52] (It also takes `Ned Batchelder`_'s suggestion of not defaulting argv to sys.argv, since Guido's reason are no longer relevant with argparse.) This can be tested via:

import mock

from foo import main

def test_main(foo):

def test_main_a(foo):
    main(['-a', '1'])

Use if __name__ == '__main__' to check if a program is being run from the command line. This makes a module both importable and executable.

8.7   Introspection / The interpreter

Three main tools: dir, type, and id.

  • reload
  • Use _ to refer to the last evaluated value.

8.8   Information Hiding

No information is truly hidden in Python.

Unlike languages like Java, where information hiding is often viewed as a matter of security, Python trusts the programmer not to toy with hidden variables and protects them only through convention. As Guido puts it, "we are all adults". (There is a theme here.)

A single-leading underscore is used to indicate something is private, e.g. _x. Private names are not exported when using the from m import * notation and are not included in calls to help().

8.8.1   Name-mangling

Name mangling only occurs with two leading underscores AND no trailing underscores:

__x     # mangled
__add__ # not mangled

A double-leading underscore ("dunder") is used to indicate a method should never be changed. Using two leading underscores causes name-mangling to occur.

The main reason for making (nearly) everything discoverable was debugging: when debugging you often need to break through the abstractions (since bugs don't confine them to the nice abstractions you've created for your program :-) so I though it would be handy to be able to see anything from the debugger. And since the debugger is written in Python itself (for flexibility and a number of other reasons) I figured the same would apply to other forms of programming -- after all, sometimes debugging doesn't imply using a debugger, it may just imply printing a certain value. Again, too much data hiding would make things more complicated here. [10]

The other observation was that even in C++, there are usually ways around the data hiding (e.g. questionable casts). Which made me realize that apparently other languages could live just fine with less-than-perfect hiding, and that hiding was an advisory mechanism, not an enforcement mechanism. So Python could probably be just fine with even-less-than-perfect hiding. :-) [10]

Any identifier of the form __spam (at least two leading underscores, at most one trailing underscore) is now textually replaced with _classname__spam, where classname is the current class name with leading underscore(s) stripped. [1]

Name mangling is intended to give classes an easy way to define "private" instance variables and methods, without having to worry about instance variables defined by derived classes, or mucking with instance variables by code outside the class. Note that the mangling rules are designed mostly to avoid accidents; it still is possible for a determined soul to access or modify a variable that is considered private. This can even be useful, e.g. for the debugger, and that's one reason why this loophole is not closed. [1]

>>> class MyClass:
...     def myPublicMethod(self):
...             print 'public method'
...     def __myPrivateMethod(self):
...             print 'this is private!!'
>>> obj = MyClass()
>>> obj.myPublicMethod()
public method
>>> obj.__myPrivateMethod()
Traceback (most recent call last):
File "", line 1, in
AttributeError: MyClass instance has no attribute '__myPrivateMethod'
>>> dir(obj)
['_MyClass__myPrivateMethod', '__doc__', '__module__', 'myPublicMethod']
>>> obj._MyClass__myPrivateMethod()
this is private!!

8.9   Anti-patterns

This document is list of common antipatterns collected from TAing Penn's Python class.

8.9.1   EAFP vs. Expressions

This is a more general problem, which is to say that expressions and exceptions don't work well together.

In this case, we see that what ought to be a list comprehension can't be if we adhere to the EAFP principle:

    def process_text(text, mappings):
        encoded_text = []
        for i in text:
            except KeyError:
        return "".join(encoded_text)

    def process_text(text, mappings):
        return "".join(mappings[i] if i in mappings else i for i in text)

    def process_text(text, mappings):
        return "".join(mappings.get(i, i) for i in text)

By default, favor semantics over speed:

def is_sorted(xs):
    return sorted(xs) == xs

def is_sorted(xs):
    for x, x' in xs, xs[1:]:
        if x > x':
            return False
    return True

One of:

    if x == a or x == b or ... or x == z:
    if x in (a, b, ..., z):

8.9.2   Setting defaults by using the value of the last operator

>>> a = 0 | 2
>>> a

One might expect a to evaluate to True, instead it evaluates to 2. Not really a problem in the context of an if statement, but awkward for assignment.

Can be useful in the case when one might use a ternary, e.g x = 0 if 0 else 2.

Also can be useful to shorten calling methods/functions, e.g. (0 or 3).bit_length() or len([] or [1, 2, 3]).

Another consequence of the compatibility requirement is that the expression "True and 6" has the value 6, and similarly the expression "False or None" has the value None. The "and" and "or" operators are usefully defined to return the first argument that determines the outcome, and this won't change; in particular, they don't force the outcome to be a bool. Of course, if both arguments are bools, the outcome is always a bool. It can also easily be coerced into being a bool by writing for example "bool(x and y)".

—Guido van Rossum, PEP 285

9   Usage

If there is one defining feature of modern Python, it's simply that is that the definition of Python itself is becoming increasingly blurred. Many projects over the last few years have taken larger and larger leaps to extend Python and reconstruct what “Python” itself means.

At the same time on there are variety of technologies encroaching on Python's niche, each bringing a variety of advantages:

Python's respones:

10   Applications

10.2   Web development

Django was released on July 21, 2005. Pylons began September 2005. Rails 1.0 was released on December 13, 2005. Rails 2.0 was released on December 7, 2007. Flask began development on April 6, 2010.

One of the simplest, most direct ways to build a Python Web app from scratch is to use the Common Gateway Interface (CGI) standard, which was a popular technique circa 1998. Here’s a high-level explanation of how it works: just create a Python script that outputs HTML, then save the script to a Web server with a ”.cgi” extension and visit the page in your Web browser. That’s it. [33]

TurboGears was originally created in 2005 by Kevin Dangoor as the framework behind the as yet unreleased Zesty News product.

The Pyramid framework began life in the Pylons project and got the name Pyramid in late 2010, though the first release was in 2005. Django had its first release in 2006, shortly after the Pylons (eventually Pyramid) project began.

Django is known for its active community, good documentation, wide usage, easy-to-use ORM, automatic admin panel, template engine, form validation, and internationalization. Its drawbacks are that both the ORM and admin are limited.

TurboGears supports SQLAlchemy.

Frameworks, such as Django_ and Pyramid.

Microframeworks such as Flask_ and Bottle.

Further reading:

10.3   Web crawling

Python was used in 1996 for Google's first successful web crawler:

Since Page wasn’t a world-class programmer, he asked a friend to help out. Scott Hassan was a full-time research assistant at Stanford, working for the Digital Library Project program while doing part-time grad work. Hassan was also good friends with Brin, whom he’d met at an Ultimate Frisbee game during his first week at Stanford.

Page’s program “had so many bugs in it, it wasn’t funny,” says Hassan. Part of the problem was that Page was using the relatively new computer language Java for his ambitious project, and Java kept crashing. “I went and tried to fix some of the bugs in Java itself, and after doing this ten times, I decided it was a waste of time,” says Hassan. “I decided to take his stuff and just rewrite it into the language I knew much better that didn’t have any bugs.” He wrote a program in Python—a more flexible language that was becoming popular for web-based programs—that would act as a “spider,” so called because it would crawl the web for data. The program would visit a web page, find all the links, and put them into a queue. Then it would check to see if it had visited those link pages previously. If it hadn’t, it would put the link on a queue of future destinations to visit and repeat the process.

Since Page wasn’t familiar with Python, Hassan became a member of the team. He and another student, Alan Steremberg, became paid assistants to the project. Brin, the math prodigy, took on the huge task of crunching the mathematics that would make sense of the mess of links uncovered by their monster survey of the growing web. Even though the small team was going somewhere, they weren’t quite sure of their destination. “Larry didn’t have a plan,” says Hassan. “In research you explore something and see what sticks.”

—Levy, Steven (2011-04-12). In The Plex (p. 18). Simon & Schuster, Inc.. Kindle Edition.

Python was released in in 1991, 5-years before HTTP 1.0 and 4-years before Java.

10.5   Email

10.5.2   IMAPClient

Date:May 09, 2014

IMAPClient is an easy-to-use, Pythonic and complete IMAP client library.

Although IMAPClient actually uses the imaplib module from the Python standard library under the hood, it provides a different API. Instead of requiring that the caller performs extra parsing work, return values are full parsed, readily usable and use sensible Python types. Exceptions are raised when problems occur (no error checking of return values is required).

  • Compared to imaplib, it's a pleasure to use. imaplib provides a low-level interface to IMAP, is poorly documented, and has unintuitive behavior. (Some return values can be different types.)
  • Besides imaplib there are several libraries available. However, it is unclear whether they are still maintained.

10.6   Testing

I recommended using pytest_. Avoid using unittest or nose.

nose is a fork of pytest. pytest is much slower since it lacks parallelization.

11   History

In December 1989, Van Rossum was looking for a "hobby" programming project that would keep me occupied during the week around Christmas. [18] My office (a government-run research lab in Amsterdam) would be closed, but I had a home computer, and not much else on my hands. [18] I decided to write an interpreter for the new scripting language I had been thinking about lately: a descendant of ABC that would appeal to Unix / C hackers. [18]

Van Rossum started working on Python at CWI in Amsterdam, and all Python development was done there from 1990-1995. [51] Major contributors during those days were Jack Jansen and Sjoerd Mullender. [51] Guido also accepted many changes from elsewhere, but the commit history shows them as mine since they didn't have remote access to our version control system (IIRC we were using CVS with direct filesystem access, not even on a server). [51]


I believe I've tracked down the origin of the term Benevolent Dictator For Life (BDFL) to a Python meeting in 1995. It's a blast from the past!
Need Scala Training or Consulting? Call Escalate.
Occasionally people ask me about the origins of my nickname BDFL (Benevolent Dictator For Life). At some point, Wikipedia claimed it was from a Monty Python skit, which is patently false, although it has sometimes been called a Pythonesque title. I recently trawled through an old mailbox of mine, and found a message from 1995 that pinpoints the origin exactly. I'm including the entire message here, to end any doubts that the term originated in the Python community.

Some background: On April 15 I had moved to the US to join CNRI for what would end up being a five-year stint. One of the first things we wanted to do was establish some kind of (semi-)formal group overseeing Python development and workshops. It was too early to think of conferences yet. The idea was to call this the Python Software Association (or perhaps the Python Software Activity), and to have it be a subsidiary of CNRI, which would give it many of the benefits of a non-profit (CNRI being one) without any of the hassle. On April 18 a group of folks interested in setting this up met: besides myself, there were Ken Manheimer and Mike McLay from NIST, Barry Warsaw, Roger Masse and Ted Strollo from CNRI, Jim Fulton from USGS, and Paul Everitt from Creative Minds, an early precursor of Zope Corporation.

As you can read below, everyone present was bestowed a title starting with First Interim, but mine was the only jocular one. While I can't prove my title (with or without the First Interim prefix) was never used before, I'm pretty certain that it originated in this meeting. Given what I know of how their minds work, it was most likely invented by Ken Manheimer or Barry Warsaw, though it may well have been a joint invention by all present. I doubt that anyone remembers (I certainly don't recall anything specifically about this meeting, there were so many meetings those days).

Anyway, here's the whole message, with all the headers. I've added some highlights to emphasize the most salient points.

Return-Path: <>
Received: from CNRI.Reston.VA.US ( []) by (8.6.9/8.6.9) with SMTP id RAA01703; Fri, 5 May 1995 17:34:51 -0400
Received: from by CNRI.Reston.VA.US id aa16056;
          5 May 95 17:34 EDT
Received: by (4.1/SMI-3.2-del.7-klm.4)
    id AA15998; Fri, 5 May 95 17:35:00 EDT
Date: Fri, 5 May 95 17:35:00 EDT
Message-Id: <>
From: Ken Manheimer
To: "Barry A. Warsaw" ,
        "Roger E. Masse" ,,
        Jim Fulton ,
        Guido van Rossum ,
        Michael McLay ,
        Kenneth Manheimer ,
        "Theodore R. Strollo"
Subject:  Notes from the last PSA meeting at CNRI - Tue, April 18, 1995
X-Mailer: VM 5.72 (beta) / Emacs 19.26.2
Organization: National Institute of Standards and Technology

Well, after a substantial delay as promised (:-), here are my notes
from the last PSA/workshop meeting at cnri.  Note that there are a few
items that we all need to get moving - paul, you have to post an
explanation of the recruitment-process for workshop session
conductors, and then all of us have to send out our solicitations.

Barry and roger, i was supposed to report to you the address of the
NIST time server - is the one i use.  I believe it
supports a number of network-time protocols - i use 'rdate' on the
suns and 'netdate' on my linux box with it.  I also understand that it
is coupled pretty closely with a NIST time-standard atomic clock.  It
is physically in boulder, but presumably the time synch mechanisms
account for the distance.  And anyway, who of us cares about
millisecond absolute accuracy?

Here are my notes, in a semi-outline format:

  Landmark first meeting of first interim PSA board, including
  first interim benevolent dictator-for-life, GvR, in attendance.

+ Attendees:

   Barry Warsaw, CNRI
   Guido van Rossum, CNRI
   Jim Fulton, USGS
   Ken Manheimer, NIST
   Michael McLay, NIST
   Paul Everitt, CMinds Inc.
   Roger Masse, CNRI

+ Python workshop

   ( my notes for the first part are sparse; after all, i wasn't the
     official notetaker until later in the meeting...)

   Not clear whether or not USGS will have the necessary internet/
       mbone connectivity - jim is investigating
   Discussions about mbone at workshop flailed around finding a
       station to base an sbus video board that barry has available, i
       may have a sparcstation IPC to bring.
   I was left with the impression that there are fundamental
       questions about whether the effort to set up an mbone broadcast
       is warranted.

 * **  Marshalling the agenda  **  action item!
    Paul agreed to be the overall workshop-session coordinator
    Agreed, on guido's suggestion, each of us would take
      responsibility for recruiting people (or taking it on
      ourselves) to handle a workshop session, and/or pieces of it.
    Division of labor:

   - Paul is going to post something explaining the overall scheme,
   - Administrative Topics and Introductions: paul
   - Distributed Computing: guido
   - Extension Modules and Basic Applications: mike, but jim's
                           emailing aaron waters
   - GUI: jim
   - Python Core: guido
   - Software Mgmt: ken

  ( Barry, roger: answer to incidental questions about reliable NIST
  time server, slaved to the atomic clock -  It
  apparently supports several time protocols, i use rdate on my
  sun, netdate on my linux system, just 'cause that's what's built

+ Discussions re PSA

 - Some suggested purposes of the PSA:
    Give python credentials - "python is not just any old software
       off the net", including visibility and formal contact point
       for python-related questions
    Coordination of python development and commercial activity
    Stability of python - branding, forum for fielding user issues, etc
    Network host making available python and PSA materials

 - Proposal we're (mike?) going to make at python workshop:

    PSA will be a user group, eventually have a network host, and
    there are efforts in the works for funding (by cnri) to make it a
    staffed organization.

 - First Interim Board of Directors - a sundry collection of a motley crew:

  * First Interim Chairman: Mike McLay
  * First Interim Keepers of 1st interim board, @CNRI
  * First Interim Keeper of the Notes: Ken Manheimer
  * First Interim Keeper of the Materials Index: Paul Everitt
  * First Interim Treasurer: decision postponed until there's money
  * First Interim Workshop Coordinator: Paul Everitt
  * First Interim Benevolent Dicator for Life: Guido van Rossum

 - (see "1st interim keeper of...", above):

    A claim on the address has been filed with the NIC, by roger masse
        it may (?) informally be active, but will only be announced
        once cnri does or does not make some arrangement for funding

    We will wait to redirect the python mailing list (
        until cnri has officially established a place for

    We will relocate the steering-committed list (
        to the host asac (As Soon As Convenient) (barry?)

 - Discussion of a procedure for conducting python development proposals
    All agree that it would be nice to have a regular procedure for
        fielding and registering proposals for changes of and
        additions to python.
    Discussion of jim's recent proposal for a generic object API
        poses a nice example of several components of such a

  . Purposes of procedure:

    To help coordinate the process, so independent groups aren't
        working separately on the same problem/issue
    Establish formal collection of proposals, so:
        people can find what's already gone before, and how they went
        people working on implementation can have a central
        collection to focus upon
  . Very preliminary draft of proposal-submission procedure
    : Champion submits initial proposal to mailing list
    : Champion fields comments, discussion
    : If still interested, champion submits followup proposal, for
          inclusion in "PSA Notes" repository
  . Notice (who could help it?) that nothing is said so far about
        formalisms for getting the proposal implemented!
  . jim, guido, and i agreed to discuss this further

ken, 301 975-3539

PEP index + PEP 1 created in 2000.

Tim Peters published The Zen of Python in 2004.

I'm never liked this list, mostly due to Python but I think other languages are also incorrectly characterized. Guido worked on ABC and wanted something for scripting tasks on a Unix machine. Python was mostly inspired by ABC but some of the syntax was inspired by Modula 3. Python is certainly not a reaction to Perl, I'm not sure Guido was aware of it when Python was born (they are close to the same age, actually).

Goodger and Warsaw created the Python Enhancement Proposals on July 13 2000.

The most complete information on versions can be found here:


11.1   Python 1.5.2


11.2   Python 1.6.1



Python 1.6 was the last of the versions developed at CNRI and the only version issued by CNRI with an open source license. Following the release of Python 1.6, and after Guido van Rossum left CNRI to work with commercial software developers, it became clear that the ability to use Python with software available under the GNU General Public License (GPL) was very desirable.

11.3   Python 2.0.1


11.4   Python 2.1.3


I can't find earlier 2.1.x versions.

On April 8 2002, we're releasing Python 2.1.3 - a bugfix release of Python 2.1

This release has only a couple of bug fixes. This is hopefully because the 2.1.x line is now stable enough that there are very few bugs still lurking. Unlike the 2.1.2 release, there has not been a wholesale attempt to port most bug fixes from the current python code to this release - only critical bugs are being fixed.

This appear to be the last version until 2.2.

11.5   Python 2.2.0


Python 2.2 can be thought of as the “cleanup release”. There are some features such as generators and iterators that are completely new, but most of the changes, significant and far-reaching though they may be, are aimed at cleaning up irregularities and dark corners of the language design. [25]

11.6   Python 2.3.0

  • Added enumerate().
  • Added logging to the standard library.

11.7   Python 2.4.0


11.8   Python 2.5.0


11.9   Python 2.6.0


11.10   Python 2.7.0


Python 2.7 added:

  • Dictionary comprehensions (Python 2.7). For example, {i: 0 for i in range(3)}.
  • Set literal notation. For example, {1, 2, 3}

11.11   Python 2.7.7


11.12   Python 2.7.8


11.13   Testing Goat

PyCon 2010 introduced the testing goat meme ("Be Stubborn. Obey the Goat.") during the Testing in Python Brids of Feather session at Pycon, where Terry Peppers used slides full of goats in his introduction. [59]

12   Criticism

>>> 1 + 2, + 3, + 4, (5,)
(3, 3, 4, (5,))
>>> 1 + 2, + 3, + (4, 5,)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: bad operand type for unary +: 'tuple'

12.1   Implicit string concatenation

Python will implicitly concatenate strings. That is, 'a' 'b' == 'ab'. People often use this feature when writing multi-line strings, however both Google and Guido consider them dangerous and unnecessary:

I just spent a few minutes staring at a bug caused by a missing comma -- I got a mysterious argument count error because instead of foo('a', 'b') I had written foo('a' 'b').

This is a common mistake, and IIRC at Google we even had a lint rule against this (there was also a Python dialect used for some specific purpose where this was explicitly forbidden).

Now, with modern compiler technology, we can (and in fact do) evaluate compile-time string literal concatenation with the '+' operator, so there's really no reason to support 'a' 'b' any more. (The reason was always rather flimsy; I copied it from C but the reason why it's needed there doesn't really apply to Python, as it is mostly useful inside macros.)

Would it be reasonable to start deprecating this and eventually remove it from the language?

—Guido van Rossum ( [57]

12.2   Cannot assign as an expression

Python does not permit assignment as an expression. For example:

>>> a = 1 + (x = 0)
File "<stdin>", line 1
a = 1 + (x = 0)
SyntaxError: invalid syntax
>>> a = 0

In part, this is to avoid ambiguity with equality:

>>> (a == 0)

12.3   Cannot assign to bound variables in closures

In Python, it is illegal to assign to variable bound in a closure. For example, the following will generate a runtime error if called:

def run_command_on_change(trigger, action):
    last_result = None
    def wrapped():
        result = trigger()
        if result is not None and result != last_result:
            last_result = result
    return wrapped

The solution is to use an object:

class CommandRunner(object):
    def __init__(self, trigger, action):
        self.trigger = trigger
        self.action = action
        self.last_result = None

    def __call__(self):
        result = self.trigger()
        if result is not None and result != self.last_result:
            self.last_result = result

12.4   Discouraged used of reduce leads to design patterns

Though reduce() exists in the language, its use is discouraged and it has been removed in Python 3. However, a whole class of routines cannot be abstracted without reduce(). For example, calculating a sum or product:

def sum(nums):
    rv = 0
    for num in nums:
        rv += num
    return rv

def product(nums):
    rv = 1
    for num in nums:
        rv *= num
    return rv

12.6   Cannot repeat a block without assigning a name

To repeat n times:

for _ in range(n):

12.7   No construct to express do-while

Python cannot express do-while cleanly, probably to avoid introducing new syntax. The best seems to be:

while True:
    if exit_condition:

12.8   Can't use partials in a class

It would be useful to be able to use partials when defining methods on a class which are identical except for a parameter that they pass to a common (usually private) method. However, it doesn't work because Python will not implicitly pass self to the functions created from functools.partial:

import base64
import functools

class TestClient(object):
    def __init__(self, app): = app

    def send(self, method, url, headers=None, authorization=None, **kwargs):
        :param url:
        :param method:
        :param headers:
        :param authorization:
        if headers is None:
            headers = {}
        if authorization is not None:
            headers['Authorization'] = 'Basic %s' % (
                base64.b64encode(b'%s:%s' % authorization)
        with, method=method, headers=headers,
            rv =
            if rv is None:
                rv =
            rv =
            rv =
            return rv

    get = functools.partial(send, 'GET')
    post = functools.partial(send, 'POST')
    put = functools.partial(send, 'PUT')
    delete = functools.partial(send, 'DELETE')

12.9   APIs that return None instead of errors

This is really an anti-pattern.

If an API returns None instead of raising it leads to the following anti-pattern:

tmp = f()
rv = g() if tmp is None else tmp
return rv

Alternatively, this can be written as:

return f() or g()

However, this is still of arguable clarity; it assumes that the reader knows that or returns the first truth-y value (as opposed to True or False).

12.10   Multi-line if statements

Visual alignment with long conditionals causes issues:

if (collResv.repeatability is None or
    collResv.rejected = True

The operator should be at the end of the line.

This can be partially fixed by factoring out the condition, preferably into a function describing what the condition is:

cond = collResv.repeatability is None or collResv.somethingElse
if cond:
    collResv.rejected = True

PEP8 doesn't offer much advice.

This PEP explicitly takes no position on how or whether to further visually distinguish continuation lines after multi-line if statements. Some acceptable options include:

# No extra indentation.
if (this
    and that):

# Add a comment, which will provide some distinction in editors
# supporting syntax highlighting.
if (this
    and that):
    # Since both conditions are true, we can frobnicate.

# Add some extra indentation on the conditional continuation line.
if (this
        and that):

12.11   Subclasses relationships aren't transitive

Subclass relationships are not transitive in Python. That is, if A is a subclass of B, and B is a subclass of C, it is not necessarily true that A is a subclass of C. The reason for this is that with PEP 3119, anyone is allowed to define their own, arbitrary __subclasscheck__ in a metaclass. [28]

Now, it's true that many others Python features trust the programmer to preserve invariants--for example, you can define an intransitive __le__ method, or mess up reference counts in a C-extension. But even the classes defined in the standard libraries do not obey subclass transitivity! Here's a simple example demonstrating this:

>>> from collections import Hashable
>>> issubclass(list, object)
>>> issubclass(object, Hashable)
>>> issubclass(list, Hashable)

For example, Iterable is a "subclass" of Hashable--seriously, because Iterable has the __hash__ method it inherits from object--but 267 iterable classes are not hashable. [28]

So are Python's intransitive subclass relationships a big deal? For day-to-day coding, it's probably not the end of the world: I'd be surprised if very many people have written code relying on subclass transitivity. And in a community which emphasizes duck-typing, extensive use of issubclass and isinstance is somewhat frowned upon anyway. [28]

But I do think this is a big deal from a language beauty and simplicity perspective. Intransitive subclass relationships defy the common-sense understanding of what it means for A to be a subclass of B. They turn the naive, simple concept of subclasses into a complicated one involving metaclasses, abstract classes, subclasschecks, and subclasshooks. In short, I believe subclass intransitivity goes against the spirit of several of the pythonic guidelines written by Tim Peters in his wonderful "The Zen of Python". [28]

13   Distribution

Although the standard library covers many programming needs, you may need to add functionality to your Python installation in the form of third-party modules. This might be necessary to support your own programming, or to support an application that you want to use and that happens to be written in Python. [34]

13.1   History

In the early days of Python, there was no such thing as a distribution. If you wanted to install someone else’s code, you had to download and build the code manually and move it into your PYTHONPATH yourself. This was not particularly simple or convenient.

A group of developers, the Disutils Special Interest Group, organized a discussion of this problem in 1998. [37] `Greg Ward`_ began work on Distutils at this time. [37] Distutils allows a developer to create a tarball_ from his project or install it project by adding a file and issuing various commands. Distutils also supports compiling C extensions for your Python package. [37] The distutils-sig discussion list was created to discuss the development of distutils. [38]

Distutils was added to the Python standard library in Python 1.6 which was released in 2000. [37]

In 2000, Catalog Sig (catalog-sig) was created to discuss creating a package index. [37] [38] The first step was to standardize the metadata that could be cataloged by an index of Python packages. `Andrew Kuchling`_ drove the effort on this, culminating in PEP 241 in 2001. [37] [38] Distutils was modified so it could work with this standardized metadata. [37]

In late 2002, Richard Jones started work on the Python Package Index ("PyPI" or "the Cheeseshop")), and created PEP 301 to describe it. [38] In 2003, PyPI was up and running. [38] Distutils was extended so the metadata and packages themselves could be uploaded to this package index. [37]

Phillip Eby started work on Setuptools in 2004. Setuptools is a whole range of extensions to Distutils such as from a binary installation format (eggs), an automatic package installation tool, and the definition and declaration of scripts for installation. Work continued throughout 2005 and 2006, and feature after feature was added to support a whole range of advanced usage scenarios. [37] [38] The sheer amount of features that Setuptools brings to the table must be stressed: namespace packages, optional dependencies, automatic manifest building by inspecting version control systems, web scraping to find packages in unusual places, recognition of complex version numbering schemes, and so on. Some of these features perhaps seem esoteric to many, but complex projects use many of them. [37]

By 2005, you could install packages automatically into your Python interpreter using easy_install. Dependencies would be automatically pulled in. If packages contained C code it would pull in the binary egg, or if not available, it would compile one automatically. [37]

The problem remained that all these packages were installed into your Python interpreter. This is icky. People's site-packages directories became a mess of packages. You also need root access to easy_install a package into your system Python. Sharing all packages in a directory in general, even locally, is not always a good idea: one version of a library needed by one application might break another one. [37]

Ian Bicking drove one line of solutions: virtual-python, which evolved into workingenv, which evolved into virtualenv in 2007. The concept behind this approach is to allow the developer to create as many fully working Python environments as they like from a central system installation of Python. When the developer activates the virtualenv, easy_install will install all packages into its the virtualenv's site-packages. This allows you to create a virtualenv per project and thus isolate each project from each other. [37]

In 2006 as well, Jim Fulton created Buildout, building on Setuptools and easy_install. Buildout can create an isolated project environment like virtualenv does, but is more ambitious: the goal is to create a system for repeatable installations of potentially very complex projects. Instead of writing an INSTALL.txt that tells others who to install the prerequites for a package (Python or not), with Buildout these prerequisites can be installed automatically. [37]

The brilliance of Buildout is that it is easily extensible with new installation recipes. These recipes themselves are also installed automatically from PyPI. This has spawned a whole ecosystem of Buildout recipes that can do a whole range of things, from generating documentation to installing MySQL. [37]

In 2008, Ian Bicking created an alternative for easy_install (the installer included with setuptools) called pip ("pip installs packages"), also building on Setuptools. Less ambitious than buildout, it aimed to fix some of the shortcomings of easy_install. [37] [38]

By 2008, Setuptools had become a vital part of the Python development infrastructure. Unfortunately the Setuptools development process has some flaws. It is very centered around `Phillip Eby`_. While he had been extremely active before, by that time he was spending a lot less energy on it. Because of the importance of the technology to the wider community, various developers had started contributing improvements and fixes, but these were piling up. [37]

This year, after some period of trying to open up the Setuptools project itself, some of these developers led by Tarek Ziade decided to fork Setuptools. The fork is named Distribute. The aim is to develop the technology with a larger community of developers. One of the first big improvements of the Distribute project is Python 3 support. [37] [74]

Phillip Eby explained that he doesn't have time to do it unless someone would pay him for that. But in the meantime, he doesn't bless anyone to do it. Well, he has blessed some people to do it (Ian Bicking and Jim Fulton), but unfortunately these people are not willing to do it because they have a lot of other projects going on. Other people that could maintain it, including me, fail in his "unqualified people" category :)

So again, I decided with some other people to create a fork called "Distribute". It's a real fork located here :

—Tarek Ziade [74]

On February 28, 2011, the PyPA is created to take over the maintenance of pip and virtualenv from Ian Bicking, led by Carl Meyer, Brian Rosner and Jannis Leidel. Other proposed names were “ianb-ng”, “cabal”, “pack” and “Ministry of Installation”. [38]

PyPA is a working group that maintains many of the relevant projects in Python packaging. They host projects on github and bitbucket, and discuss issues on the pypa-dev mailing list.

One June 19, 2012, the effort to include “Distutils2/Packaging” in Python 3.3 was abandoned due lack of involvement. [38]

13.2   Binaries

An alternative to shipping your code is freezing it — shipping it as an executable with a bundled Python interpreter. Many applications you use every day do this, e.g. DropBox and BitTorrrent_.

Use frozen binaries to share games and the like (so people don't need Python).

13.3   Python Package Index

The Python Package Index list thousands of third party modules for Python.

13.4   Python Distribution Utilities

Python 2.0 introduced Python Distribution Utilities (Distutils for short), which added support for adding third-party modules to an existing Python installation. [34]

As a developer you're responsibilities are [39]:

  1. Write a setup script.
  2. Optionally, write a setup configuration file
  3. Create a source distribution
  4. Optionally, create one or more built (binary) distributions. (Not all module developers have access to a multitude of platforms, so it’s not always feasible to expect them to create a multitude of built distributions.) [39]

Both developers and installers have the same basic user interface, i.e. the setup script. The difference is which Distutils commands they use: the sdist command is almost exclusively for module developers, while install is more often for installers (although most developers will want to install their own code occasionally). [39]

A module distribution is a collection of Python modules distributed together as a single downloadable resource and meant to be installed en masse. Examples of some well-known module distributions are Numeric Python, PyXML, PIL (the Python Imaging Library), or mxBase. (This would be called a package, except that term is already taken in the Python context: a single module distribution may contain zero, one, or many Python packages.) [39]

A Python distribution is a versioned archive file that contains Python packages, modules, and other resource files that are used to distribute a Release. The distribution file is what an end-user will download from the internet and install. [40]

13.4.1   Writing the setup script is the project specification file for distutils and setuptools.

If all you want to do is distribute a module called foo, contained in a file, then your setup script can be as simple at this [39]:

from distutils.core import setup


For a small module distribution, you might prefer to list all modules rather than listing packages—especially the case of a single module that goes in the “root package” (i.e., no package at all). [39]

py_modules = ['mod1', 'pkg.mod2']

Most information you supply to Distutils is supplied as keyword arguments to setup(). These keyword argument fall into two categories: package metadata (name, version, number) and information about what's in the package (a list of pure Python modules in this case).

A slightly more involved example might look like this:

from distutils.core import setup

      description='Python Distribution Utilities',
      author='Greg Ward',
      packages=['distutils', 'distutils.command'],

Here, we specific pure Python modules by package, rather than by module, along with more meta-data. More meta-data options can be found here:

The main purpose of the setup script is to describe your module distribution to the Distutils, so that the various commands that operate on your modules do the right thing. [39]   Relationships between Distributions and Packages

A distribution may relate to packages in three specific ways: it can require, provide, or obsolete packages. These relationships can be specified using keyword arguments to the distutils.core.setup() function.

  • Dependencies on other Python modules and packages can be specified by supplying the requires keyword argument to setup(). The value must be a list of strings. Each string specifies a package that is required, and optionally what versions are sufficient. For example, ==1.0 would mean that only version 1.0 is compatible, and >1.0, !=1.5.1, <2.0 would mean any version after 1.0 and before 2.0 is compatible, except 1.5.1.

  • To specify what we provide that other distributions can require we can use the provides keyword argument to setup(). The value for this keyword is a list of strings, each of which names a Python module or package, and optionally identifies the version. If the version is not specified, it is assumed to match that of the distribution. For example, mkpkg or mypkg (1.1).

  • A package can declare that it obsoletes other packages using the obsoletes keyword argument. The value for this is similar to that of the requires keyword: a list of strings giving module or package specifiers. Each specifier consists of a module or package name optionally followed by one or more version qualifiers. Version qualifiers are given in parentheses after the module or package name.

    The versions identified by the qualifiers are those that are obsoleted by the distribution being described. If no qualifiers are given, all versions of the named module or package are understood to be obsoleted.   Installing scripts

Scripts are files containing Python source code, intended to be started from the command line. The scripts options is a list of files to be treated as scripts. For example:

      scripts=['scripts/xmlproc_parse', 'scripts/xmlproc_val']
     )   Installing package data

Package data can be added to packages using the package_data keyword argument to the setup() function. The value must be a mapping from package name to a list of relative path names that should be copied into the package. For example, if a package should contain a subdirectory with several data files, the files can be arranged like this in the source tree:

The corresponding call to setup() might be:

      package_dir={'mypkg': 'src/mypkg'},
      package_data={'mypkg': ['data/*.dat']},
      )   Installing additional files

The data_files option can be used to specify additional files needed by the module distribution which do not fit in the previous categories: configuration files, message catalogs, data files.

data_files specifies a sequence of (directory, files) pairs in the following way:

  data_files=[('bitmaps', ['bm/b1.gif', 'bm/b2.gif']),
              ('config', ['cfg/data.cfg']),
              ('/etc/init.d', ['init-script'])]

Each (directory, files) pair in the sequence specifies the installation directory and the files to install there.

13.4.2   Creating a source distribution

To create a source distribution for this module, you would create a setup script,, containing the above code, and run this command from a terminal

python sdist

sdist will create an archive file (e.g., tarball on Unix, ZIP file on Windows) containing your setup script, and your module The archive file will be named foo-1.0.tar.gz (or .zip), and will unpack into a directory foo-1.0. [39]

13.4.3   Installing a source distribution

If you download a module source distribution, you can tell pretty quickly if it was packaged and distributed in the standard way, i.e. using the Distutils:

  • The distribution’s name and version number will be featured prominently in the name of the downloaded archive, e.g. foo-1.0.tar.gz or

  • The archive will unpack into a similarly-named directory: foo-1.0 or widget-0.9.7.

  • The distribution will contain a setup script, and a file named README.txt or possibly just README, which should explain that building and installing the module distribution is a simple matter of running one command from a terminal [34]:

    python install

Building and installing a module distribution using the Distutils is usually one simple command to run from a terminal [34]:

python install

13.4.4   Building

Running install builds and installs all modules in one run. If you prefer to work incrementally—especially useful if you want to customize the build process, or if things are going wrong—you can use the setup script to do one thing at a time. This is particularly helpful when the build and install will be done by different users—for example, you might want to build a module distribution and hand it off to a system administrator for installation (or do it yourself, with super-user privileges). [34]

For example, you can build everything in one step, and then install everything in a second step, by invoking the setup script twice [34]:

python build
python install

If you do this, you will notice that running the install command first runs the build command, which—in this case—quickly notices that it has nothing to do, since everything in the build directory is up-to-date. [34]

The build command is responsible for putting the files to install into a build directory. By default, this is build under the distribution root. [34]

13.4.5   Installation

After the build command runs (whether you run it explicitly, or the install command does it for you), the work of the install command is relatively simple: all it has to do is copy everything under build/lib (or build/lib.plat) to your chosen installation directory. [34]

If you don’t choose an installation directory -- i.e., if you just run install -- then the install command installs to the standard location for third-party Python modules. This location varies by platform and by how you built/installed Python itself. On Unix (and Mac OS X, which is also Unix-based), it also depends on whether the module distribution being installed is pure Python or contains extensions (“non-pure”) [34]:

Pure? Standard installation location Default value
Yes prefix/lib/pythonX.Y/site-packages /usr/local/lib/pythonX.Y/site-packages
No exec-prefix/lib/pythonX.Y/site-packages /usr/local/lib/pythonX.Y/site-packages

Most Linux distributions include Python as a standard part of the system, so prefix and exec-prefix are usually both /usr on Linux. If you build Python yourself on Linux (or any Unix-like system), the default prefix and exec-prefix are /usr/local. [34]

You can find these values at the interpreter [34]:

>>> import sys
>>> sys.prefix
>>> sys.exec_prefix

For a Mac using homebrew:

>>> import sys
>>> sys.prefix
>>> sys.exec_prefix

13.4.6   Alternate installation

Often, it is necessary or desirable to install modules to a location other than the standard location for third-party Python modules. For example, on a Unix system you might not have permission to write to the standard third-party module directory. Or you might wish to try out a module before making it a standard part of your local Python installation. [34]

The Distutils install command is designed to make installing module distributions to an alternate location simple and painless. [34]

Note that the various alternate installation schemes are mutually exclusive: you can pass --user, or --home, or --prefix and --exec-prefix, or --install-base and --install-platbase, but you can’t mix from these groups. [34]   The user scheme

This scheme is designed to be the most convenient solution for users that don’t have write permission to the global site-packages directory or don’t want to install into it. It is enabled with a simple option [34]:

python install --user

The advantage of using this scheme compared to the other ones described below is that the user site-packages directory is under normal conditions always included in sys.path, which means that there is no additional step to perform after running the script to finalize the installation. [34]   The home scheme

The idea behind the “home scheme” is that you build and maintain a personal stash of Python modules. This scheme’s name is derived from the idea of a “home” directory on Unix, since it’s not unusual for a Unix user to make their home directory have a layout similar to /usr/ or /usr/local/. This scheme can be used by anyone, regardless of the operating system they are installing for. [34]

Installing a new module distribution is as simple as:

python install --home=~

To make Python find the distributions installed with this scheme, you may have to modify Python’s search path.

13.5   Installing a development version

  1. Check out the project's main development branch like so:

    git clone git:// django-trunk
  2. Make sure that the Python interpreter can load the project's code. The most convenient way to do this is via pip. Run the following command in a virtualenv:

    pip install -e django-trunk/

    This will make the project's code importable.

When you want to update your copy of the project's source code, just run the command git pull from within the project's directory.

This can be effectively with combined git submodules.

13.6   Setuptools

Author:Philip Eby

Setuptools is an extension to the 'disutils' for large or complex distrubtions. It was created by Phillip Eby in 2004.

setuptools is a collection of enhancements to the Python distutils (for Python 2.3.5 and up on most platforms; 64-bit platforms require a minimum of Python 2.4) that allow you to more easily build and distribute Python packages, especially ones that have dependencies on other packages. [75]


The Distribute fork was created specifically due to the lack of progress in EasyInstall development.

13.6.1   easy_install

EasyInstall is a module bundled with Setuptools.

13.6.2   Eggs

An egg is a Built Distribution format introduced by setuptools.

Eggs are a distribution format for Python modules, similar in concept to Java's "jars" or Ruby's "gems". They differ from previous Python distribution formats in that they are importable (i.e. they can be added to sys.path), and they are discoverable, meaning that they carry metadata that unambiguously identifies their contents and dependencies, and thus can be automatically found and added to sys.path in response to simple requests of the form, "get me everything I need to use docutils' PDF support".

13.6.3   pkg_resources

pkg_resources - Package resource API

pkg_resources is a module distributed with setuptools that is used to find and manage Python package dependencies.

A resource is a logical file contained within a package, or a logical subdirectory thereof. The package resource API expects resource names to have their path parts separated with /, not whatever the local path separator is. Do not use os.path operations to manipulate resource names being passed into the API.

Getting ImportError: cannot import name Shell

13.7   Virtualenv

virtualenv is a tool to create isolated Python environments.

The basic usage is:

$ virtualenv ENV

This creates ENV/lib/pythonX.X/site-packages, where any libraries you install will go. It also creates ENV/bin/python, which is a Python interpreter that uses this environment. Anytime you use that interpreter (including when a script has #!/path/to/ENV/bin/python in it) the libraries in that environment will be used.

A new virtualenv also includes the pip installer, so you can use ENV/bin/pip to install additional packages into the environment.

In a newly created virtualenv there will be a bin/activate shell script:

$ source bin/activate

This will change your $PATH so its first entry is the virtualenv’s bin/ directory. (You have to use source because it changes your shell environment in-place.) This is all it does; it’s purely a convenience. If you directly run a script or the python interpreter from the virtualenv’s bin/ directory (e.g. path/to/env/bin/pip or /path/to/env/bin/python there’s no need for activation.

After activating an environment you can use the function deactivate to undo the changes to your $PATH.

The activate script will also modify your shell prompt to indicate which environment is currently active. You can disable this behavior, which can be useful if you have your own custom prompt that already displays the active environment name. To do so, set the VIRTUAL_ENV_DISABLE_PROMPT environment variable to any non-empty value before running the activate script.

If you build with virtualenv --system-site-packages ENV, your virtual environment will inherit packages from /usr/lib/python2.7/site-packages (or wherever your global site-packages directory is). If you want isolation from the global system, do not use this flag.

When uninstalling a package, pip does not uninstall unused dependencies (mostly the implementation would probably be prone to errors [73]):

$ pip install specloud
$ pip freeze
$ pip uninstall specloud
$ pip freeze

14   Misc

Print out json nicely: curl | python -m simplejson.tool.

To print JSON:

echo '{"test":1,"test2":2}' | python -m json.tool

Use IPython notebooks to create lectures for class.

15   Community

16   Further reading

17   References

[PEP0]Goodger, Warsaw. 2000. Index of Python Enhancement Proposals.
[PEP2](1, 2) Faassen. 2001. Procedure for Adding New Modules.
[PEP4]Lowis. 2000. Deprecation of Standard Modules.
[PEP310](1, 2, 3)
[3]Mark Mruss. Apr 25 2010. Introducing Descriptors and Properties.
[6]Marty Alchin, Nov 23, 2007. Python Descriptors, Part 1 of 2.
[7]Michele Simionato. Aug 12, 2008. Things to Know About Python Super [1 of 3].
[8](1, 2, 3, 4) Raymond Hettinger. May 26, 2011. Python's super() considered super~
[10]Guido van Rossum. Mar 10, 2005. The fate of reduce() in Python 3000.
[12]Guido van Rossum. Jun 23, 2010. Method Resolution Order.
[13]Ian Ward. 2011-12-19. Unfortunate Python.
[14]Private variables through name mangling.
[15]Guido van Rossum. Aug 22, 2011. Response to "Why Python doesn't have real information hiding the in the way of the C++ (or Ruby) ideas of public, protected and private?".
[16]The Python 2.3 Method Resolution Order.
[17](1, 2) Lexical analysis.
[18](1, 2, 3, 4, 5) Guido van Rossum. 1996. Foreword for Mark Lutz' book "Programming Python" (1st ed).
[19]"With Statement Context Manangers".
[20](1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15) "Objects, values and types"
[21]Built-in Types.
[22]Raymond Hettinger. Pycon US 2013. Transforming Code into Beautfiul, Idiomatic Python.
[24]Context Manager Types.
[25](1, 2) Kuchling. What's New in Python 2.2.
[26](1, 2) Kuchling. What's New in Python 2.4.
[27]Kuchling. What's New in Python 2.5.
[28](1, 2, 3, 4) Naftal Harris. August 26, 2014. Python Subclass Relationships Aren't Transitive.
[29](1, 2) Armin Roancher. July 2, 2013. The Updated Guide to Unicode on Python.
[30](1, 2, 3, 4, 5, 6, 7) Armin Ronacher. More about Unicode in Python 2 and 3.
[31](1, 2, 3, 4, 5, 6, 7, 8, 9, 10) Unicode HOWTO.
[32](1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13) Data model.
[33]Chapter 1: Introduction to Django.
[34](1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18) Greg Ward. Installing Python Modules.
[35]Brian. Why does defining __getitem__ on a class make it iterable in python?
[37](1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18) Martijn Faassen. Nov 09, 2009. A history of Python packaging.
[38](1, 2, 3, 4, 5, 6, 7, 8, 9) Apr 09, 2014. Packaging History.
[39](1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11) Greg Ward, Anthony Baxter. An Introduction to Distutils.
[40](1, 2) May 10, 2014. Python Packaging User Guide: Glossary.
[41](1, 2) Thread State and the Global Interpreter Lock.
[42]Python Web Development.
[43](1, 2, 3) logging – Report status, error, and informational messages.
[44](1, 2, 3, 4, 5) Michele Simionato. Aug 9, 2008. Metaclasses in Python 3.0.
[45]David Mertz. April 17, 2003. A Primer on Python Metaclass Programming.
[46]Guido van Rossum. Mar 8, 2002. PEP 285 -- Adding a bool type.
[47](1, 2, 3, 4) Fredrik Lundh. January 07, 1999. Importing Python Modules.
[48]?. July 22 2013. vs requirements.txt.
[49]Mahmoud Hashemi. December 10, 2014. 10 Myths of Enterprise Python.
[50](1, 2, 3) David Goodger, Guido van Rossum. May 29, 2001.
[51](1, 2, 3, 4) Guido Van Rossum. Jul 31, 2008. Origin of BDFL.
[52](1, 2) Guido Van Rossum. May 15, 2003. Python main() functions.
[53]Holger Krekel. 2013. @PyGrunn 2013. Re-inventing Python packaging & testing.
[54](1, 2) Talin. Apr 22, 2006. PEP 3102. Keyword-Only Arguments.
[55]Tim Peters. March 6, 1999. doctstring-driven testing.!msg/comp.lang.python/DfzH5Nrt05E/Yyd3s7fPVxwJ
[56]man python
[57]Guido van Rossum. May 10, 2013. Implicit string literal concatenation considered harmful?
[58]The Import System : Loading : Submodules. Python 3.4.3.
[59]C. Titus Brown. Feb 21 2010. What's with the goat?
[60]Michael Foord. Aug 2 2000. New and Improved: Coming changes to unittest in Python 2.7 & 3.2.
[61](1, 2, 3, 4, 5) A.M. Kuchling. What's New in Python 2.1
[62](1, 2) PyUnit - the standard unit testing framework for Python.
[63]Brian Okken. Perhaps unittest is wrong, and tearDown should always run.
[64]tearDown in unittest should be executed regardless of result in setUp.
[65](1, 2, 3, 4, 5) mis6. Apr 21, 2003. Is this a super bug?.
[67](1, 2)
[68](1, 2, 3, 4, 5) Michele Simionato. August 2008. Things to know about super.
[69](1, 2) Piet Delport. Oct 28 2012. Comment on "Multiple inheritance is hard" by Ned Batchelder.
[70](1, 2) Ned Batchelder. November 13 2007. Re-throwing exceptions in Python.
[71](1, 2, 3) Armin Ronacher. Sep 22 2008. Class method differences in Python: bound, unbound and static.
[72]Brett Cannon. Dec 3, 2015. If I were designing Python's import from scratch.
[73]Apr 2 2009. pip uninstall.!topic/python-virtualenv/JqIbyUDy2-E/discussion
[74](1, 2) Tarek Ziade. 2009. The strange world of packaging - forking setuptools.
[75](1, 2) Building and Distributing Packages with setuptools.
[76]Guido van Rossum. Jan 27, 2008. [Python-Dev] functions vs methods (was Re: trunc())
[77]Guido van Rossum. Nov 22, 2006. [Python-3000] Special methods and interface-based type system.
[78]David Beazley. 2009-05-11. Inside the Python GIL.


There are two kinds of Python projects. One that is intended for others to use, e.g. libaries and frameworks and any that isn't, e.g. applications. Developers need to provide a number of pieces of metadata to distribute their projects (e.g. name, version, dependencies). This is the function of Dependencies are specified without pinning version (e.g. just "requests").

An application typically has a set of dependencies, often times even a very complex set of dependencies, that it has been tested against. Being a specific instance that has been deployed, it typically does not have a name, nor any of the other packaging related metadata. This is reflected in the abilities of a pip requirements file. A typical requirements file might look something like:

# This is an implicit value, here for clarity


Here you have each dependency shown along with an exact version specifier. While a library tends to want to have wide open ended version specifiers an application wants very specific dependencies. It may not have mattered up front what version of requests was installed but you want the same version to install in production as you developed and tested with locally.

At the top of this file you'll also notice a --index-url Your typical requirements.txt won't have this listed explicitly like this unless they are not using PyPI, it is however an important part of a requirements.txt. This single line is what turns the abstract dependency of requests==1.2.0 into a “concrete” dependency of “requests 1.2.0 from”.

This split between abstract and concrete is an important one. It was what allows the PyPI mirroring infrastructure to work. It is what allows a company to host their own private package index. It is even what enables you to fork a library to fix a bug or add a feature and use your own fork. Because an abstract dependency is a name and an optional version specifier you can install it from PyPI or from, or from your own filesystem. You can fork a library, change the code, and as long as it has the right name and version specifier that library will happily go on using it.

A more extreme version of what can happen when you use a concrete requirement where an abstract requirement should be used can be found in the Go language. In the go language the default package manager (go get) allows you to specify your imports via an url inside the code which the package manager collects and downloads. This would look something like:

import (

Here you can see that an exact url to a dependency has been specified. Now if I used a library that specified its dependencies this way and I wanted to change the “bar” library because of a bug that was affecting me or a feature I needed, I would not only need to fork the bar library, but I would also need to fork the library that depended on the bar library to update it. Even worse, if the bar library was say, 5 levels deep, then that's a potential of 5 different packages that I would need to fork and modify only to point it at a slightly different “bar”.

Developing libraries

When developing a library, it is your application. You want a specific set of dependencies that you want to fetch from a specific location and you know that you should have abstract dependencies in your and concrete dependencies in your requirements.txt but you don't want to need to maintain two separate lists which will inevitably go out of sync. As it turns out pip requirements file have a construct to handle just such a case. Given a directory with a inside of it you can write a requirements file that looks like:


-e .

Now your pip install -r requirements.txt will work just as before. It will first install the library located at the file path . and then move on to its abstract dependencies, combining them with its --index-url option and turning them into concrete dependencies and installing them.

This method grants another powerful ability. Let's say you have two or more libraries that you develop as a unit but release separately, or maybe you've just split out part of a library into its own piece and haven't officially released it yet. If your top level library still depends on just the name then you can install the development version when using the requirements.txt and the release version when not, using a file like:


-e .

This will first install the bar library from, making it equal to the name “bar”, and then will install the local package, again combining its dependencies with the --index option and installing but this time since the “bar” dependency has already been satisfied it will skip it and continue to use the in development version.

Further reading:

[101]Requirements files.

When installing software, and Python packages in particular, it’s common that you get a lot of libraries installed. You just did easy_install MyPackage and you get a dozen packages. Each of these packages has its own version.

Maybe you ran that installation and it works. Great! Will it keep working? Did you have to provide special options to get it to find everything? Did you have to install a bunch of other optional pieces? Most of all, will you be able to do it again? Requirements files give you a way to create an environment: a set of packages that work together.

Requirements make explicit, repeatable installation of packages. Requirements files are a list of packages to install.

You might think you could list these specific versions in MyApp’s – but if you do that you’ll have to edit MyApp if you want to try a new version of Framework, or release a new version of MyApp if you determine that Library 0.3 doesn’t work with your application.

You can also add optional libraries and support tools that MyApp doesn’t strictly require, giving people a set of recommended libraries.

You can also include “editable” packages – packages that are checked out from Subversion, Git, Mercurial and Bazaar.

Requirement files are mostly flat. Maybe MyApp requires Framework, and Framework requires Library. I encourage you to still list all these in a single requirement file; it is the nature of Python programs that there are implicit bindings directly between MyApp and Library. For instance, Framework might expose one of Library’s objects, and so if Library is updated it might directly break MyApp. If that happens you can update the requirements file to force an earlier version of Library, and you can do that without having to re-release MyApp at all.

Sometimes a project has "recommended" dependencies, that are not required for all uses of the project. For example, a project might offer optional PDF output if ReportLab is installed, and reStructuredText support if docutils is installed. These optional features are called "extras", and setuptools allows you to define their requirements as well. In this way, other projects that require these optional features can force the additional requirements to be installed, by naming the desired extras in their install_requires.

For example, let's say that Project A offers optional PDF and reST support:

    extras_require = {
        'PDF':  ["ReportLab>=1.2", "RXP"],
        'reST': ["docutils>=0.3"],

As you can see, the extras_require argument takes a dictionary mapping names of "extra" features, to strings or lists of strings describing those features' requirements. These requirements will not be automatically installed unless another package depends on them (directly or indirectly) by including the desired "extras" in square brackets after the associated project name. (Or if the extras were listed in a requirement spec on the EasyInstall command line.)

Note, by the way, that if a project ends up not needing any other packages to support a feature, it should keep an empty requirements list for that feature in its extras_require argument, so that packages depending on that feature don't break (due to an invalid feature name). For example, if Project A above builds in PDF support and no longer needs ReportLab, it could change its setup to this:

    extras_require = {
        'PDF':  [],
        'reST': ["docutils>=0.3"],

so that Package B doesn't have to remove the [PDF] from its requirement specifier.

From the pip docs:

Often, you will want a fast install from local archives, without probing PyPI.

First, download the archives that fulfill your requirements:

$ pip install --download <DIR> -r requirements.txt Then, install using –find-links and –no-index:

$ pip install --no-index --find-links=[file://]<DIR> -r requirements.txt

This gives offline access, but is not faster (and may be slower) than a regular pip install.:

$ time pip install -r requirements/simple.txt -q


$ time pip install --requirement requirements/simple.txt --no-index
--find-links=.packages -q



While not requiring a separate compiler toolchain like C++, Python is in fact compiled to bytecode, much like Java and many other compiled languages. Further compilation steps, if any, are at the discretion of the runtime, be it CPython, PyPy, Jython/JVM, IronPython/CLR, or some other process virtual machine.

Python may be one of the most flexible technologies among general-use programming languages. To list just a few:

Telephony infrastructure (Twilio) Payments systems (PayPal, Balanced Payments) Neuroscience and psychology (many, many, examples) Numerical analysis and engineering (numpy, numba, and many more) Animation (LucasArts, Disney, Dreamworks) Gaming backends (Eve Online, Second Life, Battlefield, and so many others) Email infrastructure (Mailman, Mailgun) Media storage and processing (YouTube, Instagram, Dropbox) Operations and systems management (Rackspace, OpenStack) Natural language processing (NLTK) Machine learning and computer vision (scikit-learn, Orange, SimpleCV) Security and penetration testing (so many and eBay/PayPal Big Data (Disco, Hadoop support) Calendaring (Calendar Server, which powers Apple iCal) Search systems (ITA, Ultraseek, and Google) Internet infrastructure (DNS) (BIND 10)

Not that it is a competition, but as a fun fact, Python is more strongly-typed than Java. Java has a split type system for primitives and objects, with null lying in a sort of gray area. On the other hand, modern Python has a unified strong type system, where the type of None is well-specified. Furthermore, the JVM itself is also dynamically-typed, as it traces its roots back to an implementation of a Smalltalk VM acquired by Sun.

Each runtime has its own performance characteristics, and none of them are slow per se. The more important point here is that it is a mistake to assign performance assessments to a programming languages. There are several Python implementations:

CPython is the reference implementation, and also the most widely distributed and used. Jython is a mature implementation of Python for usage with the JVM. IronPython is Microsoft’s Python for the Common Language Runtime, aka .NET. PyPy is an up-and-coming implementation of Python, with advanced features such as JIT compilation, incremental garbage collection, and more.

Having cleared that up, here is a small selection of cases where Python has offered significant performance advantages:

Using NumPy as an interface to Intel’s MKL SIMD PyPy‘s JIT compilation achieves faster-than-C performance Disqus scales from 250 to 500 million users on the same 100 boxes

Scale has many definitions, but by any definition, YouTube is a web site at scale. More than 1 billion unique visitors per month, over 100 hours of uploaded video per minute, and going on 20 pecent of peak Internet bandwidth, all with Python as a core technology. Dropbox, Disqus, Eventbrite, Reddit, Twilio, Instagram, Yelp, EVE Online, Second Life, and, yes, eBay and PayPal all have Python scaling stories that prove scale is more than just possible: it’s a pattern.

Python has great concurrency primitives, including generators, greenlets, Deferreds, and futures. Python has great concurrency frameworks, including eventlet, gevent, and Twisted. Python has had some amazing work put into customizing runtimes for concurrency, including Stackless and PyPy. All of these and more show that there is no shortage of engineers effectively and unapologetically using Python for concurrent programming. Also, all of these are officially support and/or used in enterprise-level production environments.

The Global Interpreter Lock, or GIL, is a performance optimization for most use cases of Python, and a development ease optimization for virtually all CPython code. The GIL makes it much easier to use OS threads or green threads (greenlets usually), and does not affect using multiple processes. For more information see:


It all started with ABC, a wonderful teaching language that I had helped create in the early eighties. It was an incredibly elegant and powerful language, aimed at non-professional programmers. Despite all its elegance and power and the availability of a free implementation, ABC never became popular in the Unix/C world. I can only speculate about the reasons, but here's a likely one: the difficulty of adding new "primitive" operations to ABC. It was a monolithic, "closed system", with only the most basic I/O operations: read a string from the console, write a string to the console. I decided not repeat this mistake in Python.

Besides this intention, I had a number of other ideas for improvement over ABC, and was eager to try them out. For instance, ABC's powerful data types turned out to be less efficient than we hoped. There was too much emphasis on theoretically optimal algorithms, and not enough tuning for common cases. I also felt that some of ABC's features, aimed at novice programmers, were less desirable for the (then!) intended audience of experienced Unix/C programmers. For instance: ABC's ideosyncratic syntax (all uppercase keywords!); some terminology (e.g. "how-to" instead of "procedure"); and the integrated structured editor, which its users almost universally hated. Python would rely more on the Unix infrastructure and conventions, without being Unix-bound. And in fact, the first implementation was done on a Mac.

As it turned out, Python is remarkably free from many of the hang-ups of conventional programming languages. This is perhaps due to my choice of examples: besides ABC, my main influence was Modula-3. This is another language with remarkable elegance and power, designed by a small, strong-willed team (most of whom I had met during a summer internship at DEC's Systems Research Center in Palo Alto). Imagine what Python would have looked like if I had modelled it after the Unix shell and C instead! (Yes, I borrowed from C too, but only its least controversial features, in my desire to please the Unix/C audience.)

Any individual creation has its ideosyncracies, and occasionally its creator has to justify these. Perhaps Python's most controversial feature is its use of indentation for statement grouping, which derives directly from ABC. It is one of the language's features that is dearest to my heart. It makes Python code more readable in two ways. First, the use of indentation reduces visual clutter and makes programs shorter, thus reducing the attention span needed to take in a basic unit of code. Second, it allows the programmer less freedom in formatting, thereby enabling a more uniform style, which makes it easier to read someone else's code. (Compare, for instance, the three or four different conventions for the placement of braces in C, each with strong proponents.)

This emphasis on readability is no accident. As an object-oriented language, Python aims to encourage the creation of reusable code. Even if we all wrote perfect documentation all of the time, code can hardly be considered reusable if it's not readable. Many of Python's features, in addition to its use of indentation, conspire to make Python code highly readable. This reflects the philosophy of ABC, which was intended to teach programming in its purest form, and therefore placed a high value on clarity.

Readability is often enhanced by reducing unnecessary variability. When possible, there's a single, obvious way to code a particular construct. This reduces the number of choices facing the programmer who is writing the code, and increases the chance that will appear familiar to a second programmer reading it. Yet another contribution to Python's readability is the choice to use punctuation mostly in a conservative, conventional manner. Most operator symbols are familiar to anyone with even a vague recollection of high school math, and no new meanings have to be learned for comic strip curse characters like @&$!.


The Python language supports two ways of representing what we know as “strings”, i.e. series of characters. In Python 2, the two types are string and unicode, and in Python 3 they are bytes and string. A key aspect of the Python 2 string and Python 3 bytes types are that they contain no information regarding what encoding the data is stored in. For this reason they were commonly referred to as byte strings on Python 2, and Python 3 makes this name more explicit. The origins of this come from Python’s background of being developed before the Unicode standard was even available, back when strings were C-style strings and were just that, a series of bytes. Strings that had only values below 128 just happened to be ASCII strings and were printable on the console, whereas strings with values above 128 would produce all kinds of graphical characters and bells.

Contrast the “byte-string” type with the “unicode/string” type. Objects of this latter type are created whenever you say something like u"hello world" (or in Python 3, just "hello world"). In this case, Python represents each character in the string internally using multiple bytes per character (something similar to UTF-16). What’s important is that when using the unicode/string type to store strings, Python knows the data’s encoding; it’s in its own internal format. Whereas when using the string/bytes type, it does not.

When Python 2 attempts to treat a byte-string as a string, which means it’s attempting to compare/parse its characters, to coerce it into another encoding, or to decode it to a unicode object, it has to guess what the encoding is. In this case, it will pretty much always guess the encoding as ascii... and if the byte-string contains bytes above value 128, you’ll get an error. Python 3 eliminates much of this confusion by just raising an error unconditionally if a byte-string is used in a character-aware context.

There is one operation that Python can do with a non-ASCII byte-string, and it’s a great source of confusion: it can dump the byte-string straight out to a stream or a file, with nary a care what the encoding is. To Python, this is pretty much like dumping any other kind of binary data (like an image) to a stream somewhere. In Python 2, it is common to see programs that embed all kinds of international characters and encodings into plain byte-strings (i.e. using "hello world" style literals) can fly right through their run, sending reams of strings out to wherever they are going, and the programmer, seeing the same output as was expressed in the input, is now under the illusion that his or her program is Unicode-compliant. In fact, their program has no unicode awareness whatsoever, and similarly has no ability to interact with libraries that are unicode aware. Python 3 makes this much less likely by defaulting to unicode as the storage format for strings.

Python creates new string objects when new strings are created. That's reasonable, but we need to remember that two strings may have the same value, but be different objects:

>>> c = "It's just a flesh wound."
>>> d = "It's just a flesh wound."
>>> c is d

Strings s1 and s2 have the same value but are different objects, and each has a separate copy of its string data. This is the usual behaviour when strings are created. However, Python can automatically intern short strings - re-using existing string objects rather than allocating new ones:

>>> a = 'Arthur'
>>> b = 'Arthur'
>>> a is b

Interning was originally intended to avoid string duplication for internal strings like method and class names. In Python 2.7, according to the docs, ...the names used in Python programs are automatically interned, and the dictionaries used to hold module, class or instance attributes have interned keys. Python automatically interned s1 and s2 because they have short values, but we can request that Python intern any string by using intern():

>>> e = intern("It's just a flesh wound.")
>>> f = intern("It's just a flesh wound.")
>>> e is f

This is sometimes called string "folding". Folding does just what we want: combines strings so that instead of code holding references to multiple string objects that have the same value, all the references for any given value are to the same string object. Remember that Python strings are immutable, so it's completely safe to have many different parts of the code holding references to the same string object: there's no way for the object's value to be changed.

for x in y: makes sense because if x in y.


iter accepts a sentinel value. See File Iterators. Curious what the uses might be.

Templates for new projects:


The idea of shipping a standard library made more sense in the 90s, before the popularity of the Internet. Nowadays, we have many libraries which are much better than the standard library. Do we stick with this idea of the standard library? Or do we focus on improving distribution?

Perl started with the idea of there are many ways to do things with CPAN.

Python is still trying to catch up to CPAN. Trying to standardize on pip and wheels, but of course easy_install remains a reality.

tox aims to standardize testing.

People won't switch if you just invent a new standard. You need to write meta tools so that nobody has to change anything about the way they are doing things. tox adopts this strategy.

Moshe Zadka. Idioms and Anti-Idioms in Python.

On why to try-except:

The following is a very popular anti-idiom:

def get_status(file):
    if not os.path.exists(file):
        print "file not found"
    return open(file).readline()

Consider the case where the file gets deleted between the time the call to os.path.exists() is made and the time open() is called. In that case the last line will raise an IOError. The same thing would happen if file exists but has no read permission. Since testing this on a normal machine on existent and non-existent files makes it seem bugless, the test results will seem fine, and the code will get shipped. Later an unhandled IOError (or perhaps some other EnvironmentError) escapes to the user, who gets to watch the ugly traceback.

On line breaks:

Since Python treats a newline as a statement terminator, and since statements are often more than is comfortable to put in one line, many people do:

if['first'][0] == baz.quux(1, 2)[5:9] and \
   calculate_number(10, 20) != forbulate(500, 360):

You should realize that this is dangerous: a stray space after the would make this line wrong, and stray spaces are notoriously hard to see in editors. In this case, at least it would be a syntax error, but if the code was:

value =['first'][0]*baz.quux(1, 2)[5:9] \
        + calculate_number(10, 20)*forbulate(500, 360)

then it would just be subtly wrong.

It is usually much better to use the implicit continuation inside parenthesis:

This version is bulletproof:

value = (['first'][0]*baz.quux(1, 2)[5:9]
        + calculate_number(10, 20)*forbulate(500, 360))

Design and History FAQ

Why are colons required for the if/while/def/class statements?

The colon is required primarily to enhance readability (one of the results of the experimental ABC language). Consider this:

if a == b
    print a


if a == b:
    print a

Notice how the second one is slightly easier to read. Notice further how a colon sets off the example in this FAQ answer; it’s a standard usage in English.

Another minor reason is that the colon makes it easier for editors with syntax highlighting; they can look for colons to decide when indentation needs to be increased instead of having to do a more elaborate parsing of the program text.

Within a module, the module’s name (as a string) is available as the value of the global variable __name__. Therefore:

>>> import os
>>> os.__name__

__name__ in the interpreter evaluates to __main__ (Where is this defined?) This allows:

>>> __name__

When you run a script python it doesn't import it-- it just executes each statement in the interpreter, which is why if __name__ == '__main__' works.

So to get around this, just import the module from the interpreter.

It seems the interpreter module is __main__, which explains __name__.

This module represents the (otherwise anonymous) scope in which the interpreter’s main program executes — commands read either from standard input, from a script file, or from an interactive prompt. It is this environment in which the idiomatic “conditional script” stanza causes a script to run:

if __name__ == "__main__":

The main program is defined here:

Annoyed by virtualenvwrapper since it's not a single command. I can't remember how to do things like all the virtualenvs.

Python is part of the Linux Standard Base.

Why [] is faster than list:

So it really boils down to Python's inherent dynamism. With list it needs to account for code like list = dict; list(), but with [] there's no way for a user to override it.

Replacing a for loop with a list or generator comprehension will move at least some and potentially all of the execution of the loop into pure C.

Replacing if statements with built-in methods (the big examples here are things like dict.get and dict.setdefault) will similarly move execution down into C.

To clean all .pyc files:

find . -name \*.pyc -delete

To not generate bytecode (


For very long lines, from PEP 8:

Backslashes may still be appropriate at times. For example, long, multiple with -statements cannot use implicit continuation, so backslashes are acceptable:

with open('/path/to/some/file/you/want/to/read') as file_1,
open('/path/to/some/file/being/written', 'w') as file_2:


the "IDLE" IDE was another reference. Eric Idle was a member of Monty Python. I would presume that Integrated DeveLopment Environment was a backronym.

There used to be more references in the community, for example PyPi used to be "the cheese shop" and there was an old refactoring tool called "bicycle repair man". I keep seeing more snake references than MP references these days, I wonder if everyone's still in on the origins or if they've just run out of references.