Attempting to use choropleth maps in folium for first time, index error - python-3.x

This is my first time posting to stackoverflow, so I hope I am doing this correctly. I am currently finishing up a 'jumpstart' introduction to data analytics. We are utilizing Python with a few different packages, such as pandas, seaborn, folium etc. For part of our final project/presentation, I am trying to make a zipcode choropleth map. I have successfully imported folium and have my map displayed - the choropleth concept is new to me and is completely extracurricular. Trying to challenge myself to succeed.
I found an example of creating a choropleth map here that I am trying to use: https://medium.com/#saidakbarp/interactive-map-visualization-with-folium-in-python-2e95544d8d9b. I believe I correctly substituted the different object names for the data frame and map name that I am working with. For the geoJSON data, I found this https://github.com/OpenDataDE/State-zip-code-GeoJSON. I opened the geoJSON file in Atom, and found the feature title for what I believe to be the five digit zipcode 'ZCTA5CE10'.
Here is my code:
folium.Choropleth(geo_data='../data/tn_tennessee_zip_codes_geo.min.json',
data=slow_to_resolve,
columns=['zipcode'],
key_on='feature.properties.ZCTA5CE10',
fill_color='BuPu', fill_opacity=0.7, line_opacity=0.2,
legend_name='Zipcode').add_to(nash_map)
folium.LayerControl().add_to(nash_map)
nash_map
When I try to run the code, I get this error:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-114-a2968de30f1b> in <module>
----> 1 folium.Choropleth(geo_data='../data/tn_tennessee_zip_codes_geo.min.json',
2 data=slow_to_resolve,
3 columns=['zipcode'],
4 key_on='feature.properties.ZCTA5CE10',
5 fill_color='BuPu', fill_opacity=0.7, line_opacity=0.2,
~\anaconda3\lib\site-packages\folium\features.py in __init__(self, geo_data, data, columns, key_on, bins, fill_color, nan_fill_color, fill_opacity, nan_fill_opacity, line_color, line_weight, line_opacity, name, legend_name, overlay, control, show, topojson, smooth_factor, highlight, **kwargs)
1198 if hasattr(data, 'set_index'):
1199 # This is a pd.DataFrame
-> 1200 color_data = data.set_index(columns[0])[columns[1]].to_dict()
1201 elif hasattr(data, 'to_dict'):
1202 # This is a pd.Series
IndexError: list index out of range
Prior to this error, I had two columns from my dataframe specified, but I got some 'isnan' error that I am pretty sure was attributed to string type data in the second column, so I removed it. Now currently trying to figure out this posted error.
Can someone point me in the right direction? Please keep in mind that aside from this three week jumpstart program, I have zero programming knowledge or experience - so I am still learning terminology and concepts.
Thank you!

The error you got was an IndexError: list index out of range because you provided to the columns parameter with just one column columns=['zipcode']. It has to be two like this: columns=['zipcode', 'columnName_to_color_map'].
Where the first column 'zipcode' must match the object node in the GeoJSON file data (note the format/type must also match, that is string: '11372' IS NOT integer: 11372). The second column 'columnName_to_color_map' or whatever name you used should be the column which defined the choropleth colors.
Also note that key_on should match the first column 'zipcode'.
So the code should look like this:-
folium.Choropleth(geo_data='../data/tn_tennessee_zip_codes_geo.min.json',
data=slow_to_resolve,
columns=['zipcode', 'columnName_to_color_map'],
key_on='feature.properties.ZCTA5CE10',
fill_color='BuPu', fill_opacity=0.7, line_opacity=0.2,
legend_name='Zipcode').add_to(nash_map)
folium.LayerControl().add_to(nash_map)
nash_map

Related

Getting a list of VALUES rather than CLASSES in sqlalchemy

This problem seems very simple, but I'm having trouble finding an already existing solution on StackOverflow.
When I run a sqlalchemy command like the following
valid_columns = db.session.query(CrmLabels).filter_by(user_id=client_id).first()
I get back a CrmLabels object that is not iterable. If I print this object, I get a list
[Convert Source, Convert Medium, Landing Page]
But this is not iterable. I would like to get exactly what I've shown above, except as a list of strings
['Convert Source', 'Convert Medium', 'Landing Page']
How can I run a query that will return this result?
Below change should do it:
valid_columns = (
db.session.query(CrmLabels).filter_by(user_id=client_id)
.statement.execute() # just add this
.first()
)
However, you need to be certain about the order of columns, and you can use valid_columns.keys() to make sure the values are in the expected order.
Alternatively, you can create a dictionary using dict(valid_columns.items()).

Python: How to excute a variable in a string in a for loop in a function?

I have an excel (output of survey) and I am trying to write codes to give marks based on the entries of a survey of different students.
Suppose the data at hand is given as follow
import warnings, pickle
import numpy as np, pandas as pd
warnings.filterwarnings('ignore')
A='pickle.loads(b\'\\x80\\x03cpandas.core.frame\\nDataFrame\\nq\\x00)\\x81q\\x01}q\\x02(X\\x05\\x00\\x00\\x00_dataq\\x03cpandas.core.internals.managers\\nBlockManager\\nq\\x04)\\x81q\\x05(]q\\x06(cpandas.core.indexes.base\\n_new_Index\\nq\\x07cpandas.core.indexes.base\\nIndex\\nq\\x08}q\\t(X\\x04\\x00\\x00\\x00dataq\\ncnumpy.core.multiarray\\n_reconstruct\\nq\\x0bcnumpy\\nndarray\\nq\\x0cK\\x00\\x85q\\rC\\x01bq\\x0e\\x87q\\x0fRq\\x10(K\\x01K\\x04\\x85q\\x11cnumpy\\ndtype\\nq\\x12X\\x02\\x00\\x00\\x00O8q\\x13K\\x00K\\x01\\x87q\\x14Rq\\x15(K\\x03X\\x01\\x00\\x00\\x00|q\\x16NNNJ\\xff\\xff\\xff\\xffJ\\xff\\xff\\xff\\xffK?tq\\x17b\\x89]q\\x18(X\\x04\\x00\\x00\\x00Nameq\\x19X \\x00\\x00\\x00Which companies have you chosen?q\\x1aX\\n\\x00\\x00\\x00Question 1q\\x1bX\\n\\x00\\x00\\x00Question 2q\\x1cetq\\x1dbX\\x04\\x00\\x00\\x00nameq\\x1eNu\\x86q\\x1fRq h\\x07cpandas.core.indexes.range\\nRangeIndex\\nq!}q"(h\\x1eNX\\x05\\x00\\x00\\x00startq#K\\x00X\\x04\\x00\\x00\\x00stopq$K\\x04X\\x04\\x00\\x00\\x00stepq%K\\x01u\\x86q&Rq\\\'e]q(h\\x0bh\\x0cK\\x00\\x85q)h\\x0e\\x87q*Rq+(K\\x01K\\x04K\\x04\\x86q,h\\x15\\x89]q-(X\\x03\\x00\\x00\\x00Ayaq.X\\x04\\x00\\x00\\x00Ramiq/X\\x07\\x00\\x00\\x00Geniousq0X\\x05\\x00\\x00\\x00Samirq1G\\x7f\\xf8\\x00\\x00\\x00\\x00\\x00\\x00X?\\x00\\x00\\x00 Mobil, ConsolidatedEdisonq2X?\\x00\\x00\\x00 DataGeneral, GeneralPublicUtilitiesq3h3X\\x11\\x00\\x00\\x00Uploaded the dataq4X\\xfe\\x00\\x00\\x00Uploaded the data,Specified the reason behind the selected stocks,You were successful in cleaning your data-sets,You have had a justification why you selected this particular time period of your chosen stock. This should have been answered in Question 2.q5X{\\x01\\x00\\x00Uploaded the data,Specified the reason behind the selected stocks,You selected both stocks from different industries,You were successful in cleaning your data-sets,You have had a justification why you selected this particular time period of your chosen stock. This should have been answered in Question 2.,You did extra work for this question and you deserve a round of applause.q6X\\xec\\x00\\x00\\x00Specified the reason behind the selected stocks,You were successful in cleaning your data-sets,You have had a justification why you selected this particular time period of your chosen stock. This should have been answered in Question 2.q7G\\x7f\\xf8\\x00\\x00\\x00\\x00\\x00\\x00X(\\x01\\x00\\x00You plot a graph over the time axis that demonstrate how your selected returns behaved in exciting periods,You produced a scatter plot that shows that you may change your x-axis from time to market return.,You also were able to compare your selected stock to other stocks within the same industryq8X\\xde\\x01\\x00\\x00You plot a graph over the time axis that demonstrate how your selected returns behaved in exciting periods,You produced a scatter plot that shows that you may change your x-axis from time to market return.,You have shown some references to what has been going in your chosen period and you gave a good justification.,You also were able to compare your selected stock to other stocks within the same industry,You did extra work for this question and you deserve a very high mark.q9h8etq:ba]q;h\\x07h\\x08}q<(h\\nh\\x0bh\\x0cK\\x00\\x85q=h\\x0e\\x87q>Rq?(K\\x01K\\x04\\x85q#h\\x15\\x89]qA(h\\x19h\\x1ah\\x1bh\\x1cetqBbh\\x1eNu\\x86qCRqDa}qEX\\x06\\x00\\x00\\x000.14.1qF}qG(X\\x04\\x00\\x00\\x00axesqHh\\x06X\\x06\\x00\\x00\\x00blocksqI]qJ}qK(X\\x06\\x00\\x00\\x00valuesqLh+X\\x08\\x00\\x00\\x00mgr_locsqMcbuiltins\\nslice\\nqNK\\x00K\\x04K\\x01\\x87qORqPuaustqQbX\\x04\\x00\\x00\\x00_typqRX\\t\\x00\\x00\\x00dataframeqSX\\t\\x00\\x00\\x00_metadataqT]qUub.\')'
df1=eval(A)
I wrote these function to help me understand the meaning of applying or getting results
# This function will give you the students's answer {Answer_Student_1_Q_1, Answer_Student_1_Q_2}
def Get_student_answers_per_question(i1, total_number_of_questions):
g=df1.index[df1['Name']!='Genious'][i1]
Gen1=df1.ix[g,:]
print(Gen1)
for i in range(total_number_of_questions):
if isinstance(Gen1[i+2], float):
foo='Answer_Student_'+str(i1+1)+'_Q_'+str(i+1)+ '= Gen1['+ str(i+2) +']'
else:
foo='Answer_Student_'+str(i1+1)+'_Q_'+str(i+1)+ '= Gen1['+ str(i+2) +'].split(",")'
exec(foo, locals(), globals())
pass
# Get student score for each answer in question
def Get_score_per_question(student, question, dictionary_mark, total_number_of_questions):
# Get_all_answers will get all the answers for student (student)
Get_all_answers=Get_student_answers_per_question(student-1,total_number_of_questions)
foo='Answer_Student_'+str(student)+'_Q_'+str(question)
print(foo)
v=[dictionary_mark.get(i) for i in exec(foo)]
return v
Now in the last function, Get_score_per_question, I was trying to code
v=[dictionary_mark.get(i) for i in exec(foo)]
where v is the score of the variable if available in the answer of the dictionary.
so depending on the entries in the string variable foo the results would be of same length with numbers
The example that I am trying to run is this
student=1
question=1
dictionary_mark={'Uploaded the data': 1,
'Specified the reason behind the selected stocks': 1,
'You selected both stocks from different industries': 1,
'You were successful in cleaning your data-sets': 2,
'You have had a justification why you selected this particular time period of your chosen stock. This should have been answered in Question 2.': 1,
'You did extra work for this question and you deserve a round of applause.': 1}
total_number_of_questions=2
Get_score_per_question(student, question, dictionary_mark, total_number_of_questions)
Where, as you can foresee, I will get the following error
TypeError: 'NoneType' object is not iterable
Can somebody help me in this regards, and is there any tutorial or page that someone could refer me to do a better coding in such surveys in python espcially when handling splits and so on.
Docs of exec() function : https://docs.python.org/3/library/functions.html#exec
Be aware that the return and yield statements may not be used outside of function definitions even within the context of code passed to the exec() function. The return value is None.
And you have used this piece of code : [dictionary_mark.get(i) for i in exec(foo)]
So obviously the None object doesn't implement __iter__ function and you get Type Error when you try to loop through

gmplot Marker does not work after it marks 256 points

I am trying to mark a bunch of points on the map with gmplot and observed that after a certain point it stops marking and wipes out all the previously marked points. I debugged the gmplot.py module and saw that when the length of points array exceeds 256 this is happening without giving any error and warning.
self.points = [] on gmplot.py
Since I am very new to Python and OOPs concept, is there a way to override this and mark more than 256 points?
Are you using gmplot.GoogleMapPlotter.Scatter or gmplot.GoogleMapPlotter.Marker. I used either and was able to get 465 points for a project that I was working on. Is it possible it is an API key issue for you?
partial snippet of my code
import gmplot
import pandas as pd
# df is the database with Lat, Lon and formataddress columns
# change to list, not sure you need to do this. I think you can cycle through
# directly using iterrows. I have not tried that though
latcollection=df['Lat'].tolist()
loncollection=df['Lon'].tolist()
addcollection=df['formataddress'].tolist()
# center map with the first co-ordinates
gmaps2 = gmplot.GoogleMapPlotter(latcollection[0],loncollection[0],13,apikey='yourKey')
for i in range(len(latcollection)):
gmaps2.marker(latcollection[i],loncollection[i],color='#FF0000',c=None,title=str(i)+' '+ addcollection[i])
gmaps2.draw(newdir + r'\laplot_marker_full.html')
I could hover over the 465th point since I knew approximately where it was and I was able to get the title with str(464) <formataddress(464)>, since my array is indexed from 0
Make sure you check the GitHub site to modify your gmplot file, in case you are working with windows.

How to solve this data set

I am importing a Data set from quandl using API. Everything is perfect, however the time series I am importing is reversed. By this I mean if I used the .head method to print the first elements in the data set, I will get the latest Data set figures and printing the tail will get oldest figures
df = pd.read_csv("https://www.quandl.com/api/v3/datasets/CHRIS/CME_CD4.csv?api_key=H32H8imfVNVm9fcEX6kB",parse_dates=['Date'],index_col='Date')
df.head()
This should be a pretty easy fix if I understand. Credit to behzad.nouri on this answer Right way to reverse pandas.DataFrame?.
You just need to reverse the order of your dataframe using the line below.
df = df.reindex(index=df.index[::-1])

How to use a conditional statement while scraping?

I'm trying to scrape the MTA website and need a little help scraping the "Train Lines Row." (Website for reference: https://advisory.mtanyct.info/EEoutage/EEOutageReport.aspx?StationID=All
The train line information is stored as image files (1 line subway, A line subway, etc) describing each line that's accessible through a particular station. I've had success scraping info out of rows in which only one train passes through, but I'm having difficulty figuring out how to iterate through the columns which have multiple trains passing through it...using a conditional statement to test for whether it has one line or multiple lines.
tableElements = table.find_elements_by_tag_name('tr')
that's the table i'm iterating through
tableElements[2].find_elements_by_tag_name('td')[1].find_element_by_tag_name('h4').find_element_by_tag_name('img').get_attribute('alt')
this successfully gives me the values if only one value exists in the particular column
tableElements[8].find_elements_by_tag_name('td')[1].find_element_by_tag_name('h4').find_elements_by_tag_name('img')
this successfully gives me a list of values I can successfully iterate through to extract my needed values.
Now I try and combine these lines of code together in a forloop to extract all the information without stopping.
for info in tableElements[1:]:
if info.find_elements_by_tag_name('td')[1].find_element_by_tag_name('h4').find_elements_by_tag_name('img')[1] == True:
for images in info.find_elements_by_tag_name('td')[1].find_element_by_tag_name('h4').find_elements_by_tag_name('img'):
print(images.get_attribute('alt'))
else:
print(info.find_elements_by_tag_name('td')[1].find_element_by_tag_name('h4').find_element_by_tag_name('img').get_attribute('alt'))
I'm getting the error message: "list index out of range." I dont know why, as every iteration done in isolation seems to work. My hunch is I haven't correctly used the boolean operation properly here. My idea was that if find_elements_by_tag_name had an index of [1] that would mean multiple image text for me to iterate through. Hence, why I want to use this boolean operation.
Hi All, thanks so much for your help. I've uploaded my full code to Github and attached a link for your reference: https://github.com/tsp2123/MTA-Scraping/blob/master/MTA.ElevatorData.ipynb
The end goal is going to be to put this information into a dataframe using some formulation of and having a for loop that will extract the image information that I want.
dataframe = []
for elements in tableElements:
row = {}
columnName1 = find_element_by_class_name('td')
..
Your logic isn't off here.
"My hunch is I haven't correctly used the boolean operation properly here. My idea was that if find_elements_by_tag_name had an index of [1] that would mean multiple image text for me to iterate through."
The problem is it can't check if the statement is True if there's nothing in index position [1]. Hence the error at this point.
if info.find_elements_by_tag_name('td')[1].find_element_by_tag_name('h4').find_elements_by_tag_name('img')[1] == True:
What you want to do is use try: So something like:
for info in tableElements[1:]:
try:
if info.find_elements_by_tag_name('td')[1].find_element_by_tag_name('h4').find_elements_by_tag_name('img')[1] == True:
for images in info.find_elements_by_tag_name('td')[1].find_element_by_tag_name('h4').find_elements_by_tag_name('img'):
print(images.get_attribute('alt'))
else:
print(info.find_elements_by_tag_name('td')[1].find_element_by_tag_name('h4').find_element_by_tag_name('img').get_attribute('alt'))
except:
#do something else
print ('Nothing found in index position.')
Is it also possible to back to your question and provide the full code? When I try this, I'm getting 11 table elements, so want to test it with the specific table you're trying to scrape.

Resources