Getting wrong results for country names using geograpy3 - python-3.x

I tried running the example code as given in the readme file for geograpy3. However, I am getting answers like this. What can be done about it?

Your question raises a similar issue as https://github.com/somnathrakshit/geograpy3/issues/3
There is now a get_geoPlace_context function that will limit the search to the GPE label of NLTK thus ignoring PERSON and ORGANIZATION entries as the orginal function get_place_context would do:
see also test_extractor.py
def testGetGeoPlace(self):
'''
test geo place handling
'''
url='http://www.bbc.com/news/world-europe-26919928'
places=geograpy.get_geoPlace_context(url=url)
if self.debug:
print(places)
self.assertEqual(['Moscow', 'Donetsk', 'Brussels', 'Kharkiv', 'Russia'],places.cities)

Related

How to list multiple artists of the song?

I am new to Spotipy, so I need help.
searched = sp.search(q=search, type="track", limit=1)
artist = searched['tracks']['items'][0]['artists'][0]['name']
search is from my Discord bot and it searches perfectly fine, all data here. But I couldn't figure out how to get all of the artists of the song. I know it has to do something with the ['0'] there because that just gets the first element. Any fixes?
Try this and report back what the output looks like:
artist = searched['tracks']['items'][0]['artists']
I believe that artist is an array of all the artists of that song (the first one returned from search), from there you should be able to loop through this array to get all the artists of that song. I think it's then:
artist[i]['name']
that would give you the name of the artist.
It's possible that there is another key needed after ['artists'], if you output the value of artist to console you can work out the exact path to where you need to go.
I found a neat library that outputs dictionaries (and loads of other things) in a formatted way: pprint. I would recommend giving that a go and see if you can find it. i.e.
import pprint as p
p.pprint(artist)
The return from sp.search can be overwhelming at first, as it includes so much data (most of which you probably don't need), but I found looking at the dictionary with pprint was really helpful.

Getting a list of VALUES rather than CLASSES in sqlalchemy

This problem seems very simple, but I'm having trouble finding an already existing solution on StackOverflow.
When I run a sqlalchemy command like the following
valid_columns = db.session.query(CrmLabels).filter_by(user_id=client_id).first()
I get back a CrmLabels object that is not iterable. If I print this object, I get a list
[Convert Source, Convert Medium, Landing Page]
But this is not iterable. I would like to get exactly what I've shown above, except as a list of strings
['Convert Source', 'Convert Medium', 'Landing Page']
How can I run a query that will return this result?
Below change should do it:
valid_columns = (
db.session.query(CrmLabels).filter_by(user_id=client_id)
.statement.execute() # just add this
.first()
)
However, you need to be certain about the order of columns, and you can use valid_columns.keys() to make sure the values are in the expected order.
Alternatively, you can create a dictionary using dict(valid_columns.items()).

How to use a conditional statement while scraping?

I'm trying to scrape the MTA website and need a little help scraping the "Train Lines Row." (Website for reference: https://advisory.mtanyct.info/EEoutage/EEOutageReport.aspx?StationID=All
The train line information is stored as image files (1 line subway, A line subway, etc) describing each line that's accessible through a particular station. I've had success scraping info out of rows in which only one train passes through, but I'm having difficulty figuring out how to iterate through the columns which have multiple trains passing through it...using a conditional statement to test for whether it has one line or multiple lines.
tableElements = table.find_elements_by_tag_name('tr')
that's the table i'm iterating through
tableElements[2].find_elements_by_tag_name('td')[1].find_element_by_tag_name('h4').find_element_by_tag_name('img').get_attribute('alt')
this successfully gives me the values if only one value exists in the particular column
tableElements[8].find_elements_by_tag_name('td')[1].find_element_by_tag_name('h4').find_elements_by_tag_name('img')
this successfully gives me a list of values I can successfully iterate through to extract my needed values.
Now I try and combine these lines of code together in a forloop to extract all the information without stopping.
for info in tableElements[1:]:
if info.find_elements_by_tag_name('td')[1].find_element_by_tag_name('h4').find_elements_by_tag_name('img')[1] == True:
for images in info.find_elements_by_tag_name('td')[1].find_element_by_tag_name('h4').find_elements_by_tag_name('img'):
print(images.get_attribute('alt'))
else:
print(info.find_elements_by_tag_name('td')[1].find_element_by_tag_name('h4').find_element_by_tag_name('img').get_attribute('alt'))
I'm getting the error message: "list index out of range." I dont know why, as every iteration done in isolation seems to work. My hunch is I haven't correctly used the boolean operation properly here. My idea was that if find_elements_by_tag_name had an index of [1] that would mean multiple image text for me to iterate through. Hence, why I want to use this boolean operation.
Hi All, thanks so much for your help. I've uploaded my full code to Github and attached a link for your reference: https://github.com/tsp2123/MTA-Scraping/blob/master/MTA.ElevatorData.ipynb
The end goal is going to be to put this information into a dataframe using some formulation of and having a for loop that will extract the image information that I want.
dataframe = []
for elements in tableElements:
row = {}
columnName1 = find_element_by_class_name('td')
..
Your logic isn't off here.
"My hunch is I haven't correctly used the boolean operation properly here. My idea was that if find_elements_by_tag_name had an index of [1] that would mean multiple image text for me to iterate through."
The problem is it can't check if the statement is True if there's nothing in index position [1]. Hence the error at this point.
if info.find_elements_by_tag_name('td')[1].find_element_by_tag_name('h4').find_elements_by_tag_name('img')[1] == True:
What you want to do is use try: So something like:
for info in tableElements[1:]:
try:
if info.find_elements_by_tag_name('td')[1].find_element_by_tag_name('h4').find_elements_by_tag_name('img')[1] == True:
for images in info.find_elements_by_tag_name('td')[1].find_element_by_tag_name('h4').find_elements_by_tag_name('img'):
print(images.get_attribute('alt'))
else:
print(info.find_elements_by_tag_name('td')[1].find_element_by_tag_name('h4').find_element_by_tag_name('img').get_attribute('alt'))
except:
#do something else
print ('Nothing found in index position.')
Is it also possible to back to your question and provide the full code? When I try this, I'm getting 11 table elements, so want to test it with the specific table you're trying to scrape.

error uploading csv file on cloud jupyter notebook

I have set up a google cloud account
I want to perform my deep learning much more faster on a jupyter notebook, but
I cannot find a way to read my csv file
I downloaded it with wget from my github account and afterwards I tried
dataset = pd.read_csv('/home/user/.jupyter/SIEMENSTRAIN.csv')
but I get the following error
pandas.parser.CParserError: Error tokenizing data. C error: Expected 2 fields in line 3, saw 12
Why? When I read it on my laptop using my jupyter notebooks, everything runs well
Any suggestions?
I tried the recommended solutions for this error and I got the next warning
/home/user/anaconda3/lib/python3.5/site-packages/ipykernel/main.py:1: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators; you can avoid this warning by specifying engine='python'.
if name == 'main':
When I ran dataset.head() this is what appeared
Any help please?
There are a number of possibilities that could be causing the problem... I would first always make sure that Pandas (pd)'s version is updated and compatible.
The more likely cause is that the CSV itself is not right, so pd.read_csv() is not able to work correctly (thus a Parse Error). This may have something to do with the headers, though I'm not sure what your original CSV file looks like. It's worth playing around with read_csv, for example:
df = pandas.read_csv(fileName, sep='delimiter', header=None)
This tampers with 2 things - the delimiter, and if pd is reading a header from CSV or not.
I go through some pd.read_csv() stuff in my book about Stock Prediction (another cool Machine Learning problem) and Deep Learning, feel free to check it out.
Good Luck!
I tried what you proposed and this is what I got
So, any suggestions?
I suppose that the path is ok, but it just won't be read properly, or am I wrong?

Name error when calling defined function in Jupyter

I am following a tutorial over at https://blog.patricktriest.com/analyzing-cryptocurrencies-python/ and I've got a bit stuck. I am tyring to define, then immediately call, a function.
My code is as follows:
def merge_dfs_on_column(dataframes, labels, col):
'''merge a single column of each dataframe on to a new combined dataframe'''
series_dict={}
for index in range(len(dataframes)):
series_dict[labels[index]]=dataframes[index][col]
return pd.DataFrame(series_dict)
# Merge the BTC price dataseries into a single dataframe
btc_usd_datasets= merge_dfs_on_column(list(exchange_data.values()),list(exchange_data.keys()),'Weighted Price')
I can clearly see that I have defined the merge_dfs_on_column fucntion and I think the syntax is correct, however, when I call the function on the last line, I get the following error:
NameError Traceback (most recent call last)
<ipython-input-22-a113142205e3> in <module>()
1 # Merge the BTC price dataseries into a single dataframe
----> 2 btc_usd_datasets= merge_dfs_on_column(list(exchange_data.values()),list(exchange_data.keys()),'Weighted Price')
NameError: name 'merge_dfs_on_column' is not defined
I have Googled for answers and carefully checked the syntax, but I can't see why that function isn't recognised when called.
Your function definition isn't getting executed by the Python interpreter before you call the function.
Double check what is getting executed and when. In Jupyter it's possible to run code out of input-order, which seems to be what you are accidentally doing. (perhaps try 'Run All')
Well, if you're defining yourself,
Then you probably have copy and pasted it directly from somewhere on the web and it might have characters that you are probably not able to see.
Just define that function by typing it and use pass and comment out other code and see if it is working or not.
"run all" does not work.
Shutting down the kernel and restarting does not help either.
If I write:
def whatever(a):
return a*2
whatever("hallo")
in the next cell, this works.
I have also experienced this kind of problem frequently in jupyter notebook
But after replacing %% with %%time the error resolved. I didn't know why?
So,after some browsing i get that this is not jupyter notenook issue,it is ipython issueand here is the issue and also this problem is answered in this stackoverflow question

Resources