XClose

An introduction to research programming with Python

Home
Menu

Variables

Variable assignment

Python has built-in support for arithmetic expressions and so can be used as a calculator. When we evaluate an expression in Python, the result is displayed, but not necessarily stored anywhere.

In [1]:
2 + 2
Out[1]:
4
In [2]:
4 * 2.5
Out[2]:
10.0

If we want to access the result in subsequent code, we have to store it. We put it in a box, with a name on the box. This is a variable. In Python we assign a value to a variable using the assignment operator =

In [3]:
four = 2 + 2
four
Out[3]:
4

As well as numeric literal values) Python also has built in support for representing textual data as sequences of characters, which in computer science terminology are termed strings). Strings in Python are indicated by enclosing their contents in either a pair of single quotation marks '...' or a pair of double quotation marks "...", for example

In [4]:
greeting = "hello world"

Naming variables

We can name variables with any combination of lower and uppercase characters, digits and underscores _ providing the first character is not a digit and the name is not a reserved keyword.

In [5]:
fOuR = 4
In [6]:
four_integer = 4
In [7]:
integer_4 = 4
In [8]:
# invalid as name cannot begin with a digit
4_integer = 4
  Cell In [8], line 2
    4_integer = 4
     ^
SyntaxError: invalid decimal literal
In [9]:
# invalid as for is a reserved word
for = 4
  Cell In [9], line 2
    for = 4
        ^
SyntaxError: invalid syntax

It is good practice to give variables descriptive and meaningful names to help make code self-documenting. As most modern development environments (including Jupyter Lab!) offer tab completion there is limited disadvantage from a keystroke perspective of using longer names. Note however that the names we give variables only have meaning to us:

In [10]:
two_plus_two = 5

Aside: Reading error messages

We have already seen a couple of examples of Python error messages. It is important, when learning to program, to develop an ability to read an error message and find, from what can initially seem like confusing noise, the bit of the error message which tells you where the problem is.

For example consider the following

In [11]:
number_1 = 1
number_2 = "2"
sum_of_numbers = number_1 + number_2
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In [11], line 3
      1 number_1 = 1
      2 number_2 = "2"
----> 3 sum_of_numbers = number_1 + number_2

TypeError: unsupported operand type(s) for +: 'int' and 'str'

We may not yet know what TypeError or Traceback refer to. However, we can see that the error happens on the third line of our code cell. We can also see that the error message:

unsupported operand type(s) for +: 'int' and 'str'

...tells us something important. Even if we don't understand the rest, this is useful for debugging!

Undefined variables and None

If we try to evaluate a variable that hasn't ever been defined, we get an error.

In [12]:
seven
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In [12], line 1
----> 1 seven

NameError: name 'seven' is not defined

In Python names are case-sensitive so for example six, Six and SIX are all different variable names

In [13]:
six = 6
six
Out[13]:
6
In [14]:
Six
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In [14], line 1
----> 1 Six

NameError: name 'Six' is not defined

There is a special None keyword in Python which can be assigned to variables to indicate a variable with no-value. This is not the same as an undefined variable:

In [15]:
nothing
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In [15], line 1
----> 1 nothing

NameError: name 'nothing' is not defined
In [16]:
nothing = None
nothing
In [17]:
print(nothing)
None

Anywhere we can use a literal value, we can instead use a variable name, for example

In [18]:
5 + four * six
Out[18]:
29
In [19]:
scary = six * six * six
scary
Out[19]:
216

Supplementary Materials: There's more on variables at Software Carpentry.

Reassignment and multiple labels

We can reassign a variable - that is change what is in the box the variable labels.

In [20]:
scary = 25
scary
Out[20]:
25

The data that was previously labelled by the variable is lost. No labels refer to it any more - so it has been garbage collected). We might imagine something pulled out of the box, and disposed of, to make way for the next occupant. In reality, though, it is the label that has moved.

We can see this more clearly if we have more than one label referring to the same box

In [21]:
name = "Grace Hopper"
nom = name
print(name)
print(nom)
Grace Hopper
Grace Hopper

and we move just one of those labels:

In [22]:
nom = "Grace Brewster Murray Hopper"
print(name)
print(nom)
Grace Hopper
Grace Brewster Murray Hopper

Variables and memory

We can now better understand our mental model of variables as labels and boxes: each box is a piece of space (an address) in computer memory. Each label (variable) is a reference to such a place and the data contained in the memory defines an object in Python. Python objects come in different types - so far we have encountered both numeric (integer) and textual (string) types - more on this later.

When the number of labels on a box (variables referencing an address) gets down to zero, then the data in the box cannot be accessed any more. This will trigger Python's garbage collector, which will then 'empty' the box (deallocated the memory at the address), making it available again to store new data.

Lower-level languages such as C and Fortran do not have garbage collectors as a standard feature. So a memory address with no references to it and which has not been specifically marked as free remains unavailable for other usage, which can lead to difficult to fix memory leak bugs.

When we execute

In [23]:
name = "Grace Hopper"
nom = name
nom = "Grace Brewster Murray Hopper"
name = "Admiral Hopper"

the following happens

  1. A new text (string) object "Grace Hopper" is created at a free address in memory and the variable name is set to refer to that address
  2. The variable nom is set to refer to the object at the address referenced by name
  3. A new text (string) object "Grace Brewster Murray Hopper" is created at a free address in memory and the variable nom is set to refer to that address
  4. A new text (string) object "Admiral Hopper" is created at a free address in memory, the variable name is set to refer to that address and the garbage collector deallocates the memory used to hold "Grace Hopper" as this memory is no longer referenced by any variables.

Supplementary materials: The website Python Tutor has a great interactive tool for visualizing how memory and references work in Python which is great for visualising memory and references. Try the scenario we just looked at.

Variables in notebooks and kernels

When code cells are executed in a notebook, the variable names and values of the referenced objects persist between cells

In [24]:
number = 1

There if we change a variable in one cell

In [25]:
number = number + 1

It keeps its new value for the next cell.

In [26]:
number
Out[26]:
2

In Jupyter terminology the Python process in which we run the code in a notebook in is termed a kernel. The kernel stores the variable names and referenced objects created by any code cells in the notebook that have been previously run. The Kernel menu in the menu bar at the top of the JupyterLab interface has an option Restart kernel.... Running this will restart the kernel currently being used by the notebook, clearing any currently defined variables and returning the kernel to a 'clean' state. As you cannot restore a kernel once restarted a warning message is displayed asking you to confirm you wish to restart.

Cell run order

Cells do not have to be evaluated in the order they appear in a notebook.

If we go back to the code cell above with contents number = number + 1, and run it again, with shift+enter then number will change from 2 to 3, then from 3 to 4. Try it!

However, running cells out of order like this can make it hard to keep track of what values are currently assigned to variables. It also makes it difficult to reproduce computations as getting the same output requires rerunning the cells in the same order, and it will not always be possible to reconstruct what the order used was.

The number in square brackets in the prompt to the left of code cells, for example [1]: indicates the position in the overall cell run order of the last run of the cell. While this allows establishing if a cell last ran before or after another cell, if some cells are run multiple times then their previous run counter values will be overwritten so we lose information about the run order.

In general if you are using notebooks in your own research you should try to make sure the notebook run and produce the desired outputs when the cells are executed sequentially from top to bottom. The Kernel menu provides an option to restart the current kernel and run all cells in order from top to bottom. If you just want to run a subset of the cells there is also an option to restart and run all cells from the top to the currently selected cell. The commands are useful for checking that a notebook will produce the expected output and run without errors when the cells are executed in order.