Useful Python 3 features

Division

In Python 2, integer division is the default, so 1/2 evaluates to 0. This means frequently having to explicitly convert integers to floats when working with integer variables

 >>> int_one = 1
 >>> int_two = 2
 >>> int_one / int_two
 0
 >>> float(int_one) / int_two
 0.5

or being careful to do things like / 2. or * 0.5. In Python 3, the default division will yield a float, and integer division is accessed using the // operator

 >>> int_one / int_two
 0.5
 >>> int_one // int_two
 0

This makes it safer to use by default, since there there is no longer any implicit conversion to integers.

Recursive glob

A small but very useful feature in Python 3 is the addition of a recursive option in the built-in glob() function. In Python 2 and 3, this function can be used to find all files and directories matching a certain pattern

 >>> import os
 >>> import glob
 >>> glob.glob(os.path.join('data', '*.fits'))
 ['data/image.fits']

Now let’s say that the data directory now contains FITS files both directly in data and in sub-directories of data. In Python 3, you can now do

 >>> import os
 >>> import glob
 >>> results = glob.glob(
 ...     os.path.join('data', '**', '*.fits'), recursive=True)
 >>> sorted(results)
 ['data/image.fits', 'data/subset1/a.fits', 'data/subset1/b.fits',
  'data/subset1/c.fits', 'data/subset2/d.fits', 'data/subset2/e.fits']

The ** is used to indicate the point in the path at which to look for recursive directories, and the recursive=True option is needed to correctly interpret the **.

Note

We use os.path.join instead of writing out the path by hand (e.g. data/*.fits) to make sure that this works on Windows as well as Linux and MacOS X.

File path manipulation

The Python 3 standard library includes the pathlib library which provides the Path() object to fulfill all your path manipulation needs. In short it basically replaces os.makedirs(), os.mkdir(), os.path, and glob.glob() in one fell swoop.

One of the nicest features is file path concatenation, replacing the cumbersome os.path.join() with the elegant:

>>> from pathlib import Path
>>> usr = Path('/usr')
>>> config = usr / '.config' / 'pep8'
>>> str(config)
'/usr/.config/pep8'
>>> config.name
'pep8'

Matrix multiplication operator

Since Python 3.5, and Numpy 1.10, it is now possible to use the @ operator to do matrix multiplication (vector product)

 >>> import numpy as np
 >>> x = np.array([[1, 2], [3, 4]])
 >>> y = np.array([[3, 2], [2, -1]])
 >>> x @ y
 array([[ 7,  0],
        [17,  2]])

Note that this is different from x * y, which returns an element-wise multiplication of the arrays:

 >>> x * y
 array([[ 3,  4],
        [ 6, -4]])

Clearing lists

In Python 2 and 3, dictionaries can easily be emptied using the .clear method:

 >>> d = {'flux': 1}
 >>> d.clear()
 >>> d
 {}

But Python 2.7 did not allow lists to be cleared in the same way:

 >>> li = ['spam', 'egg', 'spam']
 >>> li.clear()
 Traceback (most recent call last):
 ...
 AttributeError: 'list' object has no attribute 'clear'

instead requiring non-intuitive code such as:

 >>> del li[:]
 >>> li
 []

Since Python 3.3, lists can be emptied by using the clear method:

 >>> li = ['spam', 'egg', 'spam']
 >>> li.clear()
 >>> li
 []

Advanced print function

One of the widely known changes between Python 2 and Python 3 is the change from a print statement to a print() function. This change is not just esthetic, it now allows you to better customize aspects such as what separator to use between variables, and whether to go to the next line between successive print statements.

By default, print() behaves like the Python 2 print statement in that it separates variables by spaces and goes to the next line at the end of a print call:

 >>> a, b = 1, 2
 >>> print(a, b)
 1 2

The sep argument can be used to customize the separator:

 >>> print(a, b, sep=', ')
 1, 2

And similarly, the end argument can be used to customize the end of the line - this defaults to \n, which is a carriage return (or newline):

 >>> print("hello"); print("world")
 hello
 world
 >>> print("hello", end=' '); print("world")
 hello world

In the above example, we had to put the print statements on the same line, because in interactive Python, you will be returned to the Python prompt after the line is executed. However, in scripts, you can do

 print("hello ", end=' ')
 print("world")

Finally, a last useful feature is that it is possible to send the output of the print calls to file-like objects instead of the main terminal output (the standard output):

 >>> f = open('data.txt', 'w')
 >>> print(a, b, file=f)
 >>> f.close()

or better, if you are familiar with the context manager notation:

 >>> with open('data.txt', 'w') as f:
 ...     print(a, b, file=f)

Advanced unpacking

In Python 2, you can use implicit unpacking of variables to go from a list, tuple, or more generally any iterable to separate variables:

 >>> a, b, c = range(3)
 >>> a
 0
 >>> b
 1
 >>> c
 2

The number of items in the iterable on the right has to match exactly the number of variables on the left. However, there are cases where one might only be interested in the first few items of the iterable. For example, if you have a list of 5 items

 >>> values = range(5)

and are only interested in the first two, in Python 2 you would need to do either:

 >>> a, b, _, _, _ = values

or

 >>> a = values[0]
 >>> b = values[1]

Python 3 now allows users to use the *variable syntax (similar to *args in function arguments) to avoid having to write out as many variables than items in the iterable

 >>> a, b, *rest = values
 >>> a
 0
 >>> b
 1
 >>> rest
 [2, 3, 4]

The * syntax can also be used for e.g. the first variable and variables in the middle

 >>> a, *rest, b = range(5)
 >>> a, b
 (0, 4)
 >>> *rest, a, b = range(5)
 >>> a, b
 (3, 4)

This can be used for example to access the first two lines and the last line in a file:

 >>> f = open('data.txt')
 >>> first, second, *rest, last = f.readlines()
 >>> f.close()

Function annotations

Since Python 3.5, it is possible to use the following syntax to annotate functions, to provide information on inputs/outputs. For example, it is possible to specify type annotations:

 >>> def remove_spaces(x: str) -> str:
 ...     return x.replace(' ', '')

This syntax means that the input as well as the output should be a string. Now it turns out that Python doesn’t do anything with these type annotations (there are still reasons why developers might want to do this, but this is not necessarily critical for the typical user).

However, some packages have now implemented their own annotations. For example, the Astropy package uses these to allow users to specify what units different variables should be in:

 >>> import astropy.units as u
 >>> @u.quantity_input
 ... def kinetic_energy(mass: u.kg, velocity: u.m / u.s):
 ...    return 0.5 * mass * velocity ** 2

This does then raise an error if the variables do not have units attached:

 >>> kinetic_energy(1, 3)
 Traceback (most recent call last):
 ...
 TypeError: Argument 'mass' to function 'kinetic_energy' has no 'unit'
 attribute. You may want to pass in an Astropy Quantity instead.

or if the units are not compatible/convertible:

 >>> kinetic_energy(1 * u.s, 3 * u.km / u.s)
 Traceback (most recent call last):
 ...
 UnitsError: Argument 'mass' to function 'kinetic_energy' must be in
 units convertible to 'kg'.

Other packages will hopefully also provide useful annotations such as these!

Sensible comparison

In Python 2, it was possible to compare things that shouldn’t really be comparable:

 >>> '1' > 2
 True

Whether a string was greater than an integer or a float was not necessarily predictable or intuitive. In Python 3, this type of comparison is no longer allowed:

 >>> '1' > 2
 Traceback (most recent call last):
 ...
 TypeError: '>' not supported between instances of 'str' and 'int'

This should avoid quite a few bugs!

String interpolation

Python 3.6 includes a new type of strings: f-strings. The idea is that when doing string formatting, we can often end up in cases that are too verbose such as:

 >>> value = 4 * 20
 >>> 'The value is {value}.'.format(value=value)
 'The value is 80.'

or we can end up in situations where the code is unnecessarily complex, since value is detached from where it appears in the string.

 >>> 'The value is {}.'.format(value)
 'The value is 80.'

The new f-strings allow you to use variable names directly inside the curly brackets:

 >>> f'The value is {value}.'
 'The value is 80.'

You can actually use full Python expressions inside the curly brackets! For instance:

 >>> a, b = 10, 20
 >>> f'The sum of the values is {a + b}.'
 'The sum of the values is 30.'

Underscores in numbers

Have you ever had issues figuring out whether 100000000 is a hundred million or a billion? In Python 3.6, you can now add underscores anywhere in an integer, which allows you to do e.g.:

 >>> a = 1_000_000_000

This also works with hexadecimal and binary literals, e.g.

 >>> b = 0b_0011_1111_0100_1110

Unicode strings

In Python 2, only the basic ASCII character set was available in standard strings; to use the much more extensive Unicode set of characters, you had to prefix each string with a u:

 >>> s1 = "an ascii string"
 >>> s2 = u"The total is €10"

Unicode strings are the default in Python 3. This makes it more straightforward to e.g., include foreign languages, and print greek symbols (or emoji) in strings:

 >>> s3 = "Πύθων"
 >>> s4 = "unicode strings are great! 😍"

Unicode variable names

Python 3 allows many unicode symbols to be used in variable names. Unlike Julia or Swift, which allow any unicode symbol to represent a variable (including emoji) Python 3 restricts variable names to unicode characters that represent characters in written languages. In contrast, Python 2 could only use the basic ASCII character set for variable names.

This means you can use foreign language words and letter-like symbols as variable names, e.g.:

 >>> π = 3.14159
 >>> jalapeño = "a hot pepper"
 >>> ラーメン = "delicious"

But cannot use, say, emoji:

 >>> ☃ = "brrr!"
 Traceback (most recent call last):
 ...
 SyntaxError: invalid character in identifier

One nice use case is for mathematical notation:

from numpy import array, cos, sin
def rotate(vector, angle):
     θ = angle
     mat = [[cos(θ), -sin(θ)],
            [sin(θ), cos(θ)]]
     mat = array(mat)
     return mat @ vector

Using unicode variable names like this can make it easier to read complicated mathematical expressions and compare with the printed definition. Be careful not to expose unicode variable names in your project’s API, as it might be difficult for others to type these characters. Also, use caution if you’re planning to share your code as it’s fairly easy to produce illegible code this way.

More useful exceptions

Python 3 makes some error cases easier to catch. For example, to open a file and catch the error if it’s not there:

 try:
     f = open('is_it_there.txt')
 except FileNotFoundError:
     # Fallback code...

Doing this in Python 2 is more complicated:

 import errno

 try:
     f = open('is_it_there.txt')
 except OSError as e:
     if e.errno == errno.ENOENT:
         # Fallback code...
     else:
         raise  # It was an OSError for something else

Other new exception classes include PermissionError, IsADirectoryError and TimeoutError. For more information, see the Python documentation.