Using List comprehension to establish maximum and minimum values - list-comprehension

Write a python loop to find the minimum temperature at which a species of plant can survive. The list is provided in the following format: [ {'species': 'Pinus contorta', 'common name': 'Lodgepole Pine', 'min temp (°C)': -40}, {'species': 'Quercus rubra', 'common name': 'Red Oak', 'min temp (°C)': -20}, {'species': 'Betula papyrifera', 'common name': 'Paper Birch', 'min temp (°C)': -30}, {'species': 'Populus tremuloides', 'common name': 'Quaking Aspen', 'min temp (°C)': -35} ]
I tried to find a python loop to find minimum temperature

To find a minimum value you can use min() built-in function:
lst = [
{
"species": "Pinus contorta",
"common name": "Lodgepole Pine",
"min temp (°C)": -40,
},
{"species": "Quercus rubra", "common name": "Red Oak", "min temp (°C)": -20},
{
"species": "Betula papyrifera",
"common name": "Paper Birch",
"min temp (°C)": -30,
},
{
"species": "Populus tremuloides",
"common name": "Quaking Aspen",
"min temp (°C)": -35,
},
]
min_temp = min(lst, key=lambda d: d['min temp (°C)'])
print(min_temp['min temp (°C)'])
Prints:
-40

Related

How to filter two lists of dict in Python 3.8

Forgive me if the topic allready exists, but I didn't find it...
I have 3 dict lists :
list_1 = [
{'name': "Leonardo Di Caprio", 'films': ["The revenant", "Titanic", "The wold of Wall Street"]},
{'name': "Will Smith", 'films': ["I am a legend", "The pursuit of happyness"]},
{'name': "Robert De Niro", 'films': ["Taxi driver", "The godfather"]}
]
list_2 = [
{'name': "Leonardo Di Caprio", 'films': ["Titanic", "The revenant", "The wold of Wall Street"]},
{'name': "Will Smith", 'films': ["I am a legend", "The pursuit of happyness", "Aladdin"]},
{'name': "Robert De Niro", 'films': ["Taxi driver", "The godfather"]}
]
list_final = [
{'name': "Tom Hanks", 'films': ["Forest Gump", "Cast Away", "Greyhound"]},
{'name': "Will Smith", 'films': ["I am a legend", "The pursuit of happyness"]},
{'name': "Tom Cruise", 'films': ["Top Gun", "Mission impossible"]},
{'name': "Robert De Niro", 'films': ["Taxi driver", "The godfather"]},
{'name': "Leonardo Di Caprio", 'films': ["Titanic", "The revenant", "The wold of Wall Street"]},
{'name': "Harrison Ford", 'films': ["Blade Runner", "Indiana Jones"]},
{'name': "Morgan Freeman", 'films': ["Seven"]}
]
I'd like to create a function that take 2 dict lists as paramaters and returns a boolean. The aim is to check if list_1 is contained in list_final.
By "is contained" i mean :
Every actor names froom list_1 must be present in list_final (no matter the order)
Every films played by a specific actor from list_1 must be present in list_final
I have a functionnal code :
def isContained(l1 : List[Dict[str, List]], l_final: List[Dict[str, List]]) -> bool:
for elem in l1:
findOccurence = False
for element in l_final:
if elem['name'] == element['name'] and all(item in element['films'] for item in elem['films']):
findOccurence = True
if not findOccurence:
return False
return True
print(isContained(list_1, list_final)) # True
print(isContained(list_2, list_final)) # False
print(isContained(list_1, list_2)) # True
print(isContained(list_2, list_1)) # False
Output :
root#root:/tmp/TEST_PYTHON$ python3 main.py
True
False
True
False
So it works, but I'm sure there is another way to code it in a more optimized algorithm.
What bothers me is iterating over the entire final list as many times as I have iterations on list_1
Any suggestions ?
After adjusting your data structures a little to make things a bit more efficient...
It can be done utilising the intersect operator & and issubset method on set.
list_1 = {
"Leonardo Di Caprio":{"The revenant", "Titanic", "The wold of Wall Street"},
"Will Smith":{"I am a legend", "The pursuit of happyness"},
"Robert De Niro": {"Taxi driver", "The godfather"}
}
list_2 = {
"Leonardo Di Caprio": {"Titanic", "The revenant", "The wold of Wall Street"},
"Will Smith": {"I am a legend", "The pursuit of happyness", "Aladdin"},
"Robert De Niro": {"Taxi driver", "The godfather"}
}
list_final = {
"Tom Hanks": {"Forest Gump", "Cast Away", "Greyhound"},
"Will Smith": {"I am a legend", "The pursuit of happyness"},
"Tom Cruise": {"Top Gun", "Mission impossible"},
"Robert De Niro": {"Taxi driver", "The godfather"},
"Leonardo Di Caprio": {"Titanic", "The revenant", "The wold of Wall Street"},
"Harrison Ford": {"Blade Runner", "Indiana Jones"},
"Morgan Freeman": {"Seven"}
}
def isContained(l1, l_final) -> bool:
if (set(l1.keys()).issubset(set(l1.keys()))):
for key in set(l1.keys()) & set(l_final.keys()):
if (not (l1[key].issubset(l_final[key]))):
return False;
return True;
After comments from #Stef solution has been fixed and is now:
def isContained(l1, l_final) -> bool:
if (set(l1.keys()).issubset(set(l_final.keys()))):
for key in set(l1.keys()) & set(l_final.keys()):
if (not (l1[key].issubset(l_final[key]))):
return False;
else:
return False
return True;
An extra test condition is required to confirm the first condition is being met correctly...
list_3 = {"Sigourney Weaver": {"Aliens"}}
And condition:
print(isContained(list_3, list_final)) # False

Parallelise / Threading a big for loop | Python

I have a working Jupyter Notebook that models fake data in the form of a dictionary. Using Faker library and other basic Python.
Reading other posts, parallelism seems to be used on methods. However, I want to apply this technique on the big for loop; as I have many more "key-value lists" applied in the process.
Note: I've appended comments of list slicing included in the process, in case that's needed.
Is it possible to run multiple iterations of a for loop at once? (or as many as possible)
from faker import Faker
faker = Faker()
Faker.seed(1)
import pandas as pd
import random
random.seed(1)
# Key Value lists
biographic_keys = [['Name', 'faker.unique.name()'], ['Aliases', 'faker.unique.user_name()'], ['Date of birth', 'faker.unique.date_of_birth().isoformat()'], ['Phone numbers', 'faker.unique.phone_number()'], ['Addresses', 'faker.unique.address()'], ['Nationality', 'faker.unique.country()'], ['Social Security Number', 'faker.unique.ssn()'], ['Alien File Number', 'random.randrange(1000000, 999999999, random.randint(7, 9))']]
biometric_keys = [['Height', "'{}ft {}inch'.format(random.randint(4, 7), random.randint(0, 11)) if random.randint(0, 1) == 1 else '{}cm'.format(random.randint(100, 200))"], ['Weight', "'{}kg'.format(random.randint(60, 130)) if random.randint(0, 1) == 1 else '{}st {}lb'.format(random.randint(7, 50), random.randint(0, 13))"], ['Eye color', "random.choice(['Amber', 'Blue', 'Brown', 'Gray', 'Green', 'Hazel'])"], ['Hair color', "random.choice(['Brown', 'Blond', 'Black', 'Auburn', 'Red', 'Gray', 'White', 'Bald'])"]]
entries = 4
alien_key_val = []
alien_key_val.append(["Biographic data", biographic_keys])
alien_key_val.append(["Biometric data", biometric_keys])
#print(alien_key_val[0]) # name, subset
#print(alien_key_val[0][0]) # name
#print(alien_key_val[0][1]) # subset
#print(alien_key_val[0][1][0][0]) # key
#print(alien_key_val[0][1][0][1]) # invoke val
# Programmatic key-values
alien_dict = {}
for entry in range(1, entries+1):
entry_dict = {}
for i, subset in enumerate(alien_key_val):
subset_dict = {}
subset_name = alien_key_val[i][0]
for data in subset[1]:
key, invoc = data[0], data[1]
#if ('faker.unique.' in invoc) or ('random.' in invoc) or ('tf.' in invoc) or ("''.join" in invoc) or ("'{}" in invoc): val = eval(invoc)
if invoc[-1] != ':': val = eval(invoc)
else: val = ""
if 'Identification numbers' in key: val = {i[0]: i[1] for i in val}
subset_dict.update({key: val})
entry_dict.update({subset_name: subset_dict})
alien_dict.update({'id_' + str(entry): entry_dict})
print("\nALIEN_DICT:\n", alien_dict)
>>> ALIEN_DICT:
{'id_1': {'Biographic data': {'Name': 'Ryan Gallagher', 'Aliases': 'david77', 'Date of birth': '1994-03-12', 'Phone numbers': '(317)066-9074x3915', 'Addresses': '806 Tanya Stream\nNew Jeffreymouth, OH 31051', 'Nationality': 'Guatemala', 'Social Security Number': '237-87-3585', 'Alien File Number': 119580763}, 'Biometric data': {'Height': '4ft 7inch', 'Weight': '120kg', 'Eye color': 'Hazel', 'Hair color': 'White'}}, 'id_2': {'Biographic data': {'Name': 'Tiffany House', 'Aliases': 'jmonroe', 'Date of birth': '1992-12-05', 'Phone numbers': '241-586-8344', 'Addresses': '690 Sanchez Union Suite 625\nChristopherhaven, WI 21957', 'Nationality': 'Maldives', 'Social Security Number': '861-51-6071', 'Alien File Number': 177366680}, 'Biometric data': {'Height': '4ft 6inch', 'Weight': '60kg', 'Eye color': 'Hazel', 'Hair color': 'Bald'}}, 'id_3': {'Biographic data': {'Name': 'Allen Williams DDS', 'Aliases': 'kholland', 'Date of birth': '1973-11-13', 'Phone numbers': '038.836.8595', 'Addresses': '890 Bowers View Apt. 883\nHerringfort, MN 75211', 'Nationality': 'Mexico', 'Social Security Number': '205-65-6774', 'Alien File Number': 775747704}, 'Biometric data': {'Height': '175cm', 'Weight': '27st 0lb', 'Eye color': 'Amber', 'Hair color': 'Brown'}}, 'id_4': {'Biographic data': {'Name': 'Mr. Gregory Ryan', 'Aliases': 'stephen03', 'Date of birth': '1991-12-27', 'Phone numbers': '(892)184-0110', 'Addresses': '41925 Jones Estate Suite 824\nShawnmouth, NJ 15468', 'Nationality': 'Anguilla', 'Social Security Number': '320-50-5626', 'Alien File Number': 655004368}, 'Biometric data': {'Height': '148cm', 'Weight': '34st 11lb', 'Eye color': 'Amber', 'Hair color': 'Auburn'}}}
Solution appended below. Please add a solution if you believe yours is a better alternative. I'd love to learn other approaches for the future.
Inspired by Simple multithread for loop in Python
top solution.
Using multiprocessing.dummy as mp and converting my for loop grand procedure into a function.
All "entry" dictionaries are collected into list dicts and are added to dictionary big_boi_dict as originally intended.
...
def alien_dict_func(entry):
# Programmatic key-values
alien_dict = {}
entry_dict = {}
for i, subset in enumerate(alien_key_val):
subset_dict = {}
subset_name = alien_key_val[i][0]
for data in subset[1]:
key, invoc = data[0], data[1]
#if ('faker.unique.' in invoc) or ('random.' in invoc) or ('tf.' in invoc) or ("''.join" in invoc) or ("'{}" in invoc): val = eval(invoc)
if invoc[-1] != ':': val = eval(invoc)
else: val = ""
if 'Identification numbers' in key: val = {i[0]: i[1] for i in val}
subset_dict.update({key: val})
entry_dict.update({subset_name: subset_dict})
alien_dict.update({'id_' + str(entry): entry_dict})
#print("\nALIEN_DICT:\n", alien_dict)
return alien_dict
import multiprocessing.dummy as mp
if __name__=="__main__":
p=mp.Pool(4)
dicts = p.map(alien_dict_func, range(1, entries+1)) # range(0,1000) if you want to replicate your example
#print("DICTS: ", dicts)
big_boi_dict = {}
for d in dicts: big_boi_dict.update(d)
print("big_boi_dict: ", big_boi_dict)
p.close()
p.join()
>>> big_boi_dict: {'id_1': {'Biographic data': {'Name': 'Jacob Gaines', 'Aliases': 'laurenswanson', 'Date of birth': '2016-04-20', 'Phone numbers': '630-868-7169x899', 'Addresses': '0340 Lydia Passage Suite 898\nAliciaside, NC 54017', 'Nationality': 'Netherlands Antilles', 'Social Security Number': '646-75-5043', 'Alien File Number': 216185864}, 'Biometric data': {'Height': '4ft 3inch', 'Weight': '84kg', 'Eye color': 'Gray', 'Hair color': 'Blond'}}, 'id_2': {'Biographic data': {'Name': 'Carlos Wallace', 'Aliases': 'andreabray', 'Date of birth': '1927-09-11', 'Phone numbers': '069-056-6401x106', 'Addresses': '7567 Timothy Drive Suite 202\nMichealberg, WY 38137', 'Nationality': 'Zambia', 'Social Security Number': '423-34-8418', 'Alien File Number': 472177351}, 'Biometric data': {'Height': '7ft 0inch', 'Weight': '111kg', 'Eye color': 'Amber', 'Hair color': 'Brown'}}, 'id_3': {'Biographic data': {'Name': 'Jason Hill', 'Aliases': 'kimberly73', 'Date of birth': '2002-11-20', 'Phone numbers': '661.123.2301x4271', 'Addresses': '16908 Amanda Key\nLake Taraville, OH 89507', 'Nationality': 'Italy', 'Social Security Number': '855-86-1944', 'Alien File Number': 20427192}, 'Biometric data': {'Height': '125cm', 'Weight': '77kg', 'Eye color': 'Brown', 'Hair color': 'White'}}, 'id_4': {'Biographic data': {'Name': 'Melinda White PhD', 'Aliases': 'hartmanerica', 'Date of birth': '1917-05-19', 'Phone numbers': '(174)876-1971x2693', 'Addresses': '8478 Kristina Road Suite 710\nSaraview, ND 82480', 'Nationality': 'French Southern Territories', 'Social Security Number': '779-13-3745', 'Alien File Number': 501832948}, 'Biometric data': {'Height': '148cm', 'Weight': '128kg', 'Eye color': 'Gray', 'Hair color': 'Auburn'}}}
For the question of accessing every element in the loop simultaneously. I tried something like that
import time
import threading
class handle_data(threading.Thread):
def __init__(self, threadID, name, counter):
threading.Thread.__init__(self)
self.threadID = threadID
self.name = name
self.counter = counter
def run(self): # handle_data
print ("Starting " + self.name)
print(self.name, 5, self.counter)
print("Exiting " + self.name)
def stop(self):
self.running = False
if __name__ == "__main__":
for data in range(10):
handle_data(data,"Thread-"+str(data),data).start()

How can give a specific dictionary, which is in a list, a name?

I have this function that creates a dictionary for one student
It's been days of me looking over the web and trying things out, but the only change in output that I've made is putting an empty list (without a name) into the json file. A [] outputted to the file.
def add_student_to_database(fname, lname, test1, test2, test3):
fullname= '%s %s' % (fname, lname)
all_students = []
def lettergrade(test1,test2,test3):
overall = ( int(test1+test2+test3) )/3
if overall >= 93:
letter = 'A'
elif overall >= 90:
letter = 'A-'
elif overall >= 87:
letter = 'B+'
elif overall >= 83:
letter = 'B'
elif overall >= 80:
letter = 'B-'
elif overall >= 77:
letter = 'C+'
elif overall >= 70:
letter = 'C'
elif overall >= 60:
letter = 'D'
elif overall < 60:
letter = 'F'
return letter
student = {
"First name": fname,
"Last name": lname,
"Test 1": test1,
"Test 2": test2,
"Test 3": test3,
"Grade": lettergrade(test1,test2,test3)
}
all_students.append(student)
with open('students.json','a+')as json_file:
json.dump(all_students,json_file, indent= 4)
I expect to get:
'all_students': [
{'John Doe':
'tests':{
'test 1': 100,
'test 2': 100,
'test 3': 100
}
{'Will Smith':
'tests': {}(repeat for a bunch of students)
]
Instead, when it does run well, I get
{
'first name': 'John',
'Last name': 'Doe',
'Test 1': 100,
'Test 2': 100,
'Test 3': 100
}
I want to name the list "all_students" and each individual student's dictionary named by the variable fullname.
I tried starting all over again with the original code that I had (the one posted here) and its throwing this error:
Traceback (most recent call last):
File "./grades.py", line 12, in <module>
class STUDENTS(object):
File "./grades.py", line 81, in STUDENTS
add_student_to_database(fn,ln,t1,t2,t3)
File "./grades.py", line 54, in add_student_to_database
"Grade": lettergrade(test1,test2,test3)
NameError: name 'student' is not defined
Which I managed to fix but forgot how I did it. So, can you help me with all of this please?
I tested your code (by substituting a print statement for the final two lines) and it outputs what I expected, which is a single dictionary contained within a list.
[{'First name': 'Chris', 'Last name': 'Sullivan', 'Test 1': 86, 'Test 2': 99, 'Test 3': 88, 'Grade': 'A-'}]`.
Also, I don't think you would want to have the '+' in the call to open as I don't think you are going to do anything but append to the file.
Finally I don't think the all_students list is going to ever have more than one element as it is initialized every time your run add_student_to_database. To build up the list you would either have to declare it outside the function, build it into a class, or use a callback function.
Here's a class that is hopefully close to what you're after.
import json
class all_students():
def __init__(self):
""" Set up an empty dictionary to hold the student information, and a
another dictionary to contain the thresholds for each grade level.
The keys for the dictionary must be in descending order.
"""
self.all_students = {}
self.grades = {93: 'A', 90: 'A-', 87: 'B+', 83: 'B', 77: 'B-', 70: 'C', 60: 'D', 9: 'F'}
def lettergrade(self, tests):
""" Returns letter grade when passed a tuple of individual test scores.
The first argument is a tuple containing the test scores (e.g. (91, 66, 82))
The average is calculated by dividing the sum of the elements in the
tuple by the number of elements.
Then the grades dictionary is searched for the first score which is
higher than the average. When that is found, the grade is returned.
"""
overall = int(sum(tests)/len(tests))
for score, grade in self.grades.items():
if overall >= score:
return grade
def add_student_to_database(self, fname, lname, *tests):
""" Adds (or replaces) a student grade entry to the all_students dictionary. The new entry
is represented by a dictionary containing the individual test scores and the letter grade
The test scores are passed as individual arguments. *tests gathers
those positional arguments into a tuple, e.g. (91, 66, 82)
A new dictionary containing the first & last names plus the letter
grade is added to the all_students dictionary, with the key equal
to the student's full name.
Finally, that dictionary is updated with the individual test scores.
That update uses "Test 1", "Test 2", etc. as the key. The enumerate
function provides the test number in the same order as it is stored
in the tuple (Note the use of an f-string to format the key). This
function will return the index (position) of each element to the
variable i, and the value of the test score in variable g. The
indices start at 0 so we add 1 to start with Test 1, not Test 0.
"""
fullname = f"{fname} {lname}"
self.all_students[fullname] = {
'First name': fname,
'Last name': lname,
'Grade': self.lettergrade(tests)
}
self.all_students[fullname].update(dict((f'Test {i+1}', g) for i, g in enumerate(tests)))
def show_students(self):
""" Prints the names and letter grade for each student. Note the
use of an f-string to format the output. Also see the use of
the items() method to return the key/value pairs to the variables
fullname/grades respectively for each iteration of the for loop.
"""
for fullname, grades in self.all_students.items():
print(f"{fullname}: Grade is {grades['Grade']}")
def write_file(self, fname='students.json'):
""" Writes student info to json file fname & prints summary
This is basically the same as your original.
"""
with open('students.json','a') as json_file:
json.dump(self.all_students,json_file, indent= 4)
self.show_students()
# The following is run as a test when this file is run (e.g. Python programname.py)
if __name__ == '__main__':
students = all_students()
students.add_student_to_database('John', 'Doe', 80, 88, 92)
students.add_student_to_database('John', 'Public', 95, 91, 80)
students.write_file()
It uses a dictionary of dictionaries, with the outer dictionary keyed by the student's full name, and the inner dictionary similar to what you already had. I decided to allow an arbitrary number of test scores. It will work from 1 to n. It should probably check that that the number of test scores is greater than zero, and that all scores fall between 0 and 100. Each call to add_student_to_database will overwrite the previous entry for the same student.
Here's the json file it produces.
{
"John Doe": {
"First name": "John",
"Last name": "Doe",
"Grade": "B",
"Test 1": 80,
"Test 2": 88,
"Test 3": 92
},
"John Public": {
"First name": "John",
"Last name": "Public",
"Grade": "B+",
"Test 1": 95,
"Test 2": 91,
"Test 3": 80
}
}

Python nested IF statement not iterating the entire list

I need some help in understanding why it's not iterating the complete list and how I can correct this. i need to replace some values between list B and List A to do another process. The code is supposed to give me a final list of
b = ['Sick', "Mid 1", "off", "Night", "Sick", "Morning", "Night"]
I was thinking of 2 nested IF statements, because it's evaluating 2 different things. My code gives me
['Sick', 'Mid 1', 'off', 'Night', 'off', 'Morning', 'Night']
which is correct on element [0], but not on element[4].
I was playing in the indentation of i = i+1
a = ['Sick', 'PR', '', 'PR', 'Sick', 'PR', 'PR']
b = ["off", "Mid 1", "off", "Night", "off", "Morning", "Night"]
i = 0
for x in the_list:
for y in see_drop_down_list:
if x =="off":
if y == "":
the_list[i] = "off"
else:
the_list[i]=see_drop_down_list[i]
i = i + 1
print (the_list)
You don't need to do double iteration here. Corrected code:
a = ['Sick', 'PR', '', 'PR', 'Sick', 'PR', 'PR']
b = ['off', 'Mid 1', 'off', 'Night', 'off', 'Morning', 'Night']
for i in range(len(b)): # loop through all indexes of elements in "b"
if b[i] == 'off' and a[i]: # replace element, if it's "off" and corresponding element in "a" is not empty
b[i] = a[i]
print(b)
Output:
['Sick', 'Mid 1', 'off', 'Night', 'Sick', 'Morning', 'Night']

I don't know why the second if block doesn't work?

#!/usr/bin/python
from TwitterSearch import *
import sys
import csv
tso = TwitterSearchOrder() # create a TwitterSearchOrder object
tso.set_keywords(['gmo']) # let's define all words we would like to have a look for
tso.set_language('en') # we want to see English tweets only
tso.set_include_entities(False) # and don't give us all those entity information
max_range = 1 # search range in kilometres
num_results = 500 # minimum results to obtain
outfile = "output.csv"
# create twitter API object
twitter = TwitterSearch(
access_token = "764537836884242432-GzJmUSL4hcC2DOJD71TiQXwCA0aGosz",
access_token_secret = "zDGYDeigRqDkmdqTgBOltcfNcNnfLwRZPkPLlnFyY3xqQ",
consumer_key = "Kr9ThiJWvPa1uTXZoj4O0YaSG",
consumer_secret = "ozGCkXtTCyCdOcL7ZFO4PJs85IaijjEuhl6iIdZU0AdH9CCoxS"
)
# Create an array of USA states
ustates = [
"AL",
"AK",
"AS",
"AZ",
"AR",
"CA",
"CO",
"CT",
"DE",
"DC",
"FM",
"FL",
"GA",
"GU",
"HI",
"ID",
"IL",
"IN",
"IA",
"KS",
"KY",
"LA",
"ME",
"MH",
"MD",
"MA",
"MI",
"MN",
"MS",
"MO",
"MT",
"NE",
"NV",
"NH",
"NJ",
"NM",
"NY",
"NC",
"ND",
"MP",
"OH",
"OK",
"OR",
"PW",
"PA",
"PR",
"RI",
"SC",
"SD",
"TN",
"TX",
"UT",
"VT",
"VI",
"VA",
"WA",
"WV",
"WI",
"WY",
"USA"
]
def linearSearch(item, obj, start=0):
for i in range(start, len(obj)):
if item == obj[i]:
return True
return False
# open a file to write (mode "w"), and create a CSV writer object
csvfile = file(outfile, "w")
csvwriter = csv.writer(csvfile)
# add headings to our CSV file
row = [ "user", "text", "place"]
csvwriter.writerow(row)
#-----------------------------------------------------------------------
# the twitter API only allows us to query up to 100 tweets at a time.
# to search for more, we will break our search up into 10 "pages", each
# of which will include 100 matching tweets.
#-----------------------------------------------------------------------
result_count = 0
last_id = None
while result_count < num_results:
# perform a search based on latitude and longitude
# twitter API docs: https://dev.twitter.com/docs/api/1/get/search
query = twitter.search_tweets_iterable(tso)
for result in query:
state = 0
if result["place"]:
user = result["user"]["screen_name"]
text = result["text"]
text = text.encode('utf-8', 'replace')
place = result["place"]["full_name"]
state = place.split(",")[1]
if linearSearch(state,ustates):
print state
# now write this row to our CSV file
row = [ user, text, place ]
csvwriter.writerow(row)
result_count += 1
last_id = result["id"]
print "got %d results" % result_count
csvfile.close()
I am trying to categorize the tweets by my array ustates, but the second if block seems like it doesn't work. I had no idea about that. What I did was to do a linear search, if my item is equal to the item in my array, I will write it into a csv file.
as it looks like the problem is some whitespaces remaining, you can use .strip() to remove them
>>> x=" WY "
>>> x.strip()
'WY'
>>>
Also some other tips
To speed up the membership test in ustates use a set instead of a list because set have a constant time check, while list is a linear search
The preferred way to open a file is using a context manager which ensure the closing of the file at the end of the block or in case of error in the block. Also use open instead of file
with those tip the code should look like
#!/usr/bin/python
... # all the previous stuff
# Create an set of USA states
ustates = {
"AL", "AK", "AS", "AZ", "AR",
"CA", "CO", "CT",
"DE", "DC",
"FM", "FL",
"GA", "GU",
"HI",
"ID", "IL", "IN", "IA",
"KS", "KY",
"LA",
"ME", "MH", "MD", "MA", "MI", "MN", "MS", "MO", "MT", "MP",
"NE", "NV", "NH", "NJ", "NM", "NY", "NC", "ND",
"OH", "OK", "OR",
"PW", "PA", "PR",
"RI",
"SC", "SD",
"TN", "TX",
"UT",
"VT", "VI", "VA",
"WA", "WV", "WI", "WY",
"USA"
} # that arrange is just to take less lines, while grouping them alphabetically
# open a file to write (mode "w"), and create a CSV writer object
with open(outfile,"w") as csvfile:
... # the rest is the same
while result_count < num_results:
# perform a search based on latitude and longitude
# twitter API docs: https://dev.twitter.com/docs/api/1/get/search
query = twitter.search_tweets_iterable(tso)
for result in query:
state = 0
if result["place"]:
... # all the other stuff
state = state.strip() #<--- the strip part, add the .upper() if needed or just in case
if state in ustates:
... # all the other stuff
... # the rest of stuff
print "got %d results" % result_count

Resources