list comprehension and generators
list comprehension and generators
list comprehensions and generators¶
Nested list comprehensions¶
- [[output expression] for iterator variable in iterable]
- Collapse for loops for building lists into a single line
- Components
- Iterable
- Iterator variable (represent members of iterable)
- Output expression
- Components
In [1]:
# Create a 5 x 5 matrix using a list of lists: matrix
matrix = [[col for col in range(5)] for row in range(5)]
# Print the matrix
for row in matrix:
print(row)
In [7]:
pair_2=[(num1, num2) for num1 in range(0, 2) for num2 in range(6, 8)]
pair_2
Out[7]:
Using conditionals in comprehensions¶
- [ output expression for iterator variable in iterable if predicate expression ].
In [2]:
# Create a list of strings: fellowship
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']
# Create list comprehension: new_fellowship
new_fellowship = [member for member in fellowship if len(member) >= 7]
# Print the new list
print(new_fellowship)
In [3]:
# Create a list of strings: fellowship
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']
# Create list comprehension: new_fellowship
new_fellowship = [member if len(member) >= 7 else '' for member in fellowship]
# Print the new list
print(new_fellowship)
Dict comprehensions¶
- Recall that the main difference between a list comprehension and a dict comprehension is the use of curly braces {} instead of []. Additionally, members of the dictionary are created using a colon :, as in key:value
- Create dictionaries
- Use curly braces {} instead of brackets []
In [4]:
# Create a list of strings: fellowship
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']
# Create dict comprehension: new_fellowship
new_fellowship = {member:len(member) for member in fellowship}
# Print the new list
print(new_fellowship)
Generator expressions¶
- Recall list comprehension
- Use ( ) instead of [ ]
In [9]:
g = (2 * num for num in range(10))
g
Out[9]:
List comprehensions vs. generators¶
- List comprehension - returns a list
- Generators - returns a generator object
- Both can be iterated over
In [13]:
(num for num in range(10*1000000) if num % 2 == 0)
Out[13]:
Generator functions¶
Generator functions are functions that, like generator expressions, yield a series of values, instead of returning a single value. A generator function is defined as you do a regular function, but whenever it generates a value, it uses the keyword yield instead of return.¶
- Produces generator objects when called
- Defined like a regular function - def
- Yields a sequence of values instead of returning a single value
- Generates a value with yield keyword
In [15]:
def num_sequence(n):
"""Generate values from 0 to n."""
i = 0
while i < n:
yield i
i += 1
In [17]:
test=num_sequence(7)
print type(test)
In [21]:
next(test)
Out[21]:
In [22]:
test.next()
Out[22]:
- Extract the column 'created_at' from df and assign the result to tweet_time. Fun fact: the extracted column in tweet_time here is a Series data structure!
- reate a list comprehension that extracts the time from each row in tweet_time. Each row is a string that represents a timestamp, and you will access the 11th to 18th characters in the string to extract the time. Use entry as the iterator variable and assign the result to tweet_clock_time.
In [27]:
import pandas as pd
df = pd.read_csv('tweets.csv')
# Extract the created_at column from df: tweet_time
tweet_time = df['created_at']
# Extract the clock time: tweet_clock_time
tweet_clock_time = [entry[11:19] for entry in tweet_time]
# Print the extracted times
print(tweet_clock_time[:100])
Conditional list comprehesions for time-stamped data¶
- add a conditional expression to the list comprehension so that you only select the times in which entry[17:19] is equal to '19'
In [28]:
# Extract the created_at column from df: tweet_time
tweet_time = df['created_at']
# Extract the clock time: tweet_clock_time
tweet_clock_time = [entry[11:19] for entry in tweet_time if entry[17:19] == '19']
# Print the extracted times
print(tweet_clock_time)
In [ ]: