Lists in Python

Fri 06 June 2014 | tags: python

Overview

List is basic and frequently-used data structure in Python. Lists are mutable sequences, typically used to store collections of homogeneous items. You may make stupied mistakes and get stuck with the mutability of the lists. Here will introduce some advanced technologies in python lists.

List comprehensions

List comprehension is a compact way to process all or part of the elements in a sequence and return a list with the results. Here illustrates a standard list comprehension construct.

           (4)   (5)
            |     |
[ x * 2 for x in nums if x > 0 ]
 |<--->|<----------->|<------>|
   (1)      (2)         (3)

   (1) output expressions
   (2) loop clause
   (3) conditional clause (optional)
   (4) variable
   (5) input sequence

Let's see some examples:

>>> [x**2 for x in range(10)]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
>>> [x**2 for x in range(10) if x > 4 ]
[25, 36, 49, 64, 81]
>>> [(x, y) for x in range(3) for y in range(3)]
[(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)]
>>> [(x, y) for x in range(3) for y in range(3) if x == y]
[(0, 0), (1, 1), (2, 2)]

If the output expression is tuple, it must be parenthesized.

>>> [(x, x**2) for x in range(3)]
[(0, 0), (1, 1), (2, 4)]
>>> [x, x**2 for x in range(3)]
   File "<stdin>", line 1
      [x, x**2 for x in range(3)]
                 ^
SyntaxError: invalid syntax

Flatten a list using a list comprehension with two 'for's

>>> vec = [[1,2,3], [4,5,6], [7,8,9]]
>>> [nu for e in vec for nu in e]
[1, 2, 3, 4, 5, 6, 7, 8, 9]

For output expression, it can be any arbitrary expression, including another list comprehension.

Here is a nested list, actually, it may be thought of as a 3x4 matrix.

mtrx = [[1, 2,  3,  4],
        [5, 6,  7,  8],
        [9, 10, 11, 12],
       ]

Let's make mtrx transposed using list comprehensions.

>>> [[r[i] for r in mtrx] for i in range(4)]
[[1, 5, 9], [2, 6, 10], [3, 7, 11], [4, 8, 12]]

Of course, zip() may be the best solution for this case.

>>> list(zip(*mtrx))
[(1, 5, 9), (2, 6, 10), (3, 7, 11), (4, 8, 12)]

Initialization on nested list

To initialize a nested list, we may have several approaches to complete this.

# method #1
>>> l1 = [[0] * 2] * 3 # list multiply
>>> l1
[[0, 0], [0, 0], [0, 0]]

# method #2
>>> l2 = [[0 for _ in range(2)] for _ in range(3)] # list comprehension
>>> l2
[[0, 0], [0, 0], [0, 0]]

Let's see what's the difference between the two approaches.

>>> l1[2][1] = 'abc'
>>> l1
[[0, 'abc'], [0, 'abc'], [0, 'abc']]   # OMG, each element is changed
#                              ^       # we just want to change the position
                                       # signed with "^".

>>> l2[2][1] = 'xyz'
>>> l2
[[0, 0], [0, 0], [0, 'xyz']]           # this is the situation that we want.

How does this happen? Why does the two approaches have different results when we chang the element? Here for l1, the shallow copys only happened, nested list structures are not copied. So in the 1st level of this nested list, the element of the list refers to the same list object. The changes to any element will be reflected in each element. So if you don't need synced nested lists, please build lists with deep copies instead of shallow copies. You can refer to Shallow and deep copy operations for diving into details.

Modify list you are iterating over

If you need to modify the list you are iterating over while inside the loop, it is highly recommended that you first make a copy. Iterating over the list doesn't implicitly make a copy. But this slice notation make this specially convenient.

>>> languages = ['java', 'python', 'bash']
>>> for lang in languages[:]:
...     if len(lang) == 6:
...         languages.insert(0, lang)
...
>>> languages
['python', 'java', 'python', 'bash']

# what if we use languages directly?
>>> languages = ['java', 'python', 'bash']
>>> for lang in languages:
...     if len(lang) == 6:
...         languages.insert(0, lang)
...     if len(languages) > 9: # avoid infinite loops
...         break
...
>>> languages
['python', 'python', 'python', 'python', 'python', 'python', 'python', 'java', 'python', 'bash']
>>>

So you should avoid operating on the modified list by making a copy of list.

Count hashable items

Sometimes, maybe we want to count hashable items in a list. For example:

words = ['bar', 'baz', 'foo', 'bar', 'bar', 'foo']

output:

[('bar', 3), ('baz', 1), ('foo', 2)]

We can just use collections.Counter

>>> from collections import Counter
>>> words_cnt = Counter(words)
>>> words_cnt
Counter({'bar': 3, 'foo': 2, 'baz': 1})
>>> words_cnt.most_common(len(words))
[('bar', 3), ('foo', 2), ('baz', 1)]

Here a Counter object is acutally a dictionary which stores elements as dictionary keys and stores their counts as dictionary values. A Counter is a dict subclass for counting hashable objects.

Remove duplicated items

If we want to remove the duplicates elements in a list, what should we do? You may think of the simplies way is to use set() if the elements are hashable.

>>> a = ['a', 'b', 'foo', 'bar', 'a', 'bar', 'e', 'xyz']
>>> list(set(a))
['bar', 'e', 'foo', 'b', 'a', 'xyz']

But did you see it? The list output did not maintain the original order. That's the problem. We should avoid using this solution unless you absolutely don't care its original order.

Here is the another solution to preserve the order.

def rm_dups(lst):
    non_dups = set()
    for e in lst:
        if e not in non_dups:
            yield e
            non_dups.add(e)

In this code example, we assume elements in list are hashable. If the elements are unhashable (such dicts), we may make a slight change to this method.

def rm_dups(lst, key=None):
    non_dups = set()
    for e in lst:
        if key is None:
            val = e        # hashable
        else:
            val = key(e)   # unhashable
        if val not in non_dups:
            yield e
            non_dups.add(val)

>>> d = [{'a': 3}, {'b': 4}, {'a': 3}, {'c': 5}, {'d': 5}, {'c': 5}]
>>> list(rm_dups(d, key=lambda item: tuple(item.values()))) # python 3
[{'a': 3}, {'b': 4}, {'c': 5}]

Actually, the argument key should be a function in order to get values to be compared.

Sort a list

blogroll

social