List comprehensions
List comprehension is a compact way to process all or part of the elements in a
sequence and return a list with the results. Here illustrates a standard list
comprehension construct.
(4) (5)
| |
[ x * 2 for x in nums if x > 0 ]
|<--->|<----------->|<------>|
(1) (2) (3)
(1) output expressions
(2) loop clause
(3) conditional clause (optional)
(4) variable
(5) input sequence
Let's see some examples:
>>> [x**2 for x in range(10)]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
>>> [x**2 for x in range(10) if x > 4 ]
[25, 36, 49, 64, 81]
>>> [(x, y) for x in range(3) for y in range(3)]
[(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)]
>>> [(x, y) for x in range(3) for y in range(3) if x == y]
[(0, 0), (1, 1), (2, 2)]
If the output expression is tuple, it must be parenthesized.
>>> [(x, x**2) for x in range(3)]
[(0, 0), (1, 1), (2, 4)]
>>> [x, x**2 for x in range(3)]
File "<stdin>", line 1
[x, x**2 for x in range(3)]
^
SyntaxError: invalid syntax
Flatten a list using a list comprehension with two 'for's
>>> vec = [[1,2,3], [4,5,6], [7,8,9]]
>>> [nu for e in vec for nu in e]
[1, 2, 3, 4, 5, 6, 7, 8, 9]
For output expression, it can be any arbitrary expression, including another
list comprehension.
Here is a nested list, actually, it may be thought of as a 3x4 matrix.
mtrx = [[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
]
Let's make mtrx transposed using list comprehensions.
>>> [[r[i] for r in mtrx] for i in range(4)]
[[1, 5, 9], [2, 6, 10], [3, 7, 11], [4, 8, 12]]
Of course, zip() may be the best solution for this case.
>>> list(zip(*mtrx))
[(1, 5, 9), (2, 6, 10), (3, 7, 11), (4, 8, 12)]
Initialization on nested list
To initialize a nested list, we may have several approaches to complete this.
# method #1
>>> l1 = [[0] * 2] * 3 # list multiply
>>> l1
[[0, 0], [0, 0], [0, 0]]
# method #2
>>> l2 = [[0 for _ in range(2)] for _ in range(3)] # list comprehension
>>> l2
[[0, 0], [0, 0], [0, 0]]
Let's see what's the difference between the two approaches.
>>> l1[2][1] = 'abc'
>>> l1
[[0, 'abc'], [0, 'abc'], [0, 'abc']] # OMG, each element is changed
# ^ # we just want to change the position
# signed with "^".
>>> l2[2][1] = 'xyz'
>>> l2
[[0, 0], [0, 0], [0, 'xyz']] # this is the situation that we want.
How does this happen? Why does the two approaches have different results when
we chang the element? Here for l1, the shallow copys only happened, nested list
structures are not copied. So in the 1st level of this nested list, the element
of the list refers to the same list object. The changes to any element will be
reflected in each element. So if you don't need synced nested lists, please build
lists with deep copies instead of shallow copies. You can refer to
Shallow and deep copy operations for diving into details.
Modify list you are iterating over
If you need to modify the list you are iterating over while inside the loop, it
is highly recommended that you first make a copy. Iterating over the list doesn't
implicitly make a copy. But this slice notation make this specially convenient.
>>> languages = ['java', 'python', 'bash']
>>> for lang in languages[:]:
... if len(lang) == 6:
... languages.insert(0, lang)
...
>>> languages
['python', 'java', 'python', 'bash']
# what if we use languages directly?
>>> languages = ['java', 'python', 'bash']
>>> for lang in languages:
... if len(lang) == 6:
... languages.insert(0, lang)
... if len(languages) > 9: # avoid infinite loops
... break
...
>>> languages
['python', 'python', 'python', 'python', 'python', 'python', 'python', 'java', 'python', 'bash']
>>>
So you should avoid operating on the modified list by making a copy of list.
Count hashable items
Sometimes, maybe we want to count hashable items in a list. For example:
words = ['bar', 'baz', 'foo', 'bar', 'bar', 'foo']
output:
[('bar', 3), ('baz', 1), ('foo', 2)]
We can just use collections.Counter
>>> from collections import Counter
>>> words_cnt = Counter(words)
>>> words_cnt
Counter({'bar': 3, 'foo': 2, 'baz': 1})
>>> words_cnt.most_common(len(words))
[('bar', 3), ('foo', 2), ('baz', 1)]
Here a Counter object is acutally a dictionary which stores elements as
dictionary keys and stores their counts as dictionary values. A Counter
is a dict subclass for counting hashable objects.
Remove duplicated items
If we want to remove the duplicates elements in a list, what should
we do? You may think of the simplies way is to use set() if the
elements are hashable.
>>> a = ['a', 'b', 'foo', 'bar', 'a', 'bar', 'e', 'xyz']
>>> list(set(a))
['bar', 'e', 'foo', 'b', 'a', 'xyz']
But did you see it? The list output did not maintain the original order.
That's the problem. We should avoid using this solution unless you absolutely
don't care its original order.
Here is the another solution to preserve the order.
def rm_dups(lst):
non_dups = set()
for e in lst:
if e not in non_dups:
yield e
non_dups.add(e)
In this code example, we assume elements in list are hashable. If the elements
are unhashable (such dicts), we may make a slight change to this method.
def rm_dups(lst, key=None):
non_dups = set()
for e in lst:
if key is None:
val = e # hashable
else:
val = key(e) # unhashable
if val not in non_dups:
yield e
non_dups.add(val)
>>> d = [{'a': 3}, {'b': 4}, {'a': 3}, {'c': 5}, {'d': 5}, {'c': 5}]
>>> list(rm_dups(d, key=lambda item: tuple(item.values()))) # python 3
[{'a': 3}, {'b': 4}, {'c': 5}]
Actually, the argument key should be a function in order to get values to be
compared.