Use Case Apache Spark getting Nameerror running a line with countryCodeMap

Use Case Apache Spark getting Nameerror running a line with countryCodeMap - apache-spark

I'm very new to this and I'm using a use case found on databricks.com to learn more. (https://databricks.com/blog/2018/07/09/analyze-games-from-european-soccer-leagues-with-apache-spark-and-databricks.html)
I'm running spark through Jupiter notebook and python 3. I have been able to load the files etc but I'm getting a nameError for on of the lines. it says it has not been defined, but I can't see anywhere where to define it or how to do that.
the line is this:
gameInfDf = gameInfDf.withColumn("country_code", mapKeyToVal(countryCodeMap) ("country"))
the nameError is: name 'countryCodeMap' is not defined
before this I ran this code chunk:
def mapKeyToVal(mapping):
def mapKeyToVal_(col):
return mapping.get(col)
return udf(mapKeyToVal_, StringType())
Can please someone tell me if I'm running it on the wrong program or what my problem is?
Thank you very much in advance.

As per https://databricks.com/blog/2018/07/09/analyze-games-from-european-soccer-leagues-with-apache-spark-and-databricks.html. You missed the space in your return. Not sure how you were able to run this part. I get an error when I try to define the UDF.
Try this:
def mapKeyToVal(mapping):
def mapKeyToVal_(col):
return mapping.get(col)
return udf(mapKeyToVal_, StringType())

Related

Except python does not catch Windows error FileNotFoundError | "The system cannot find the specified path." [duplicate]

I have this python code:
import os
try:
os.system('wrongcommand')
except:
print("command does not work")
The code prints:
wrongcommand: command not found
Instead of command does not work. Does anyone know why it's not printing my error message?

If you want to have an exception thrown when the command doesn't exist, you should use subprocess:
import subprocess
try:
subprocess.run(['wrongcommand'], check = True)
except subprocess.CalledProcessError:
print ('wrongcommand does not exist')
Come to think of it, you should probably use subprocess instead of os.system anyway ...

Because os.system() indicates a failure through the exit code of the method
return value == 0 -> everything ok
return value != 0 -> some error
The exit code of the called command is directly passed back to Python.
There is documentation telling you that os.system() would raise an exeption in case of a failure. os.system() just calls the underlaying system() call of the OS and returns its return value.
Please read the os.system() documentation carefully.

Although subprocess might be your best friend. os.system is still useful in somewhere, especially to the programmer play C/C++ mode.
Hence, the code will be below.
import os
try:
os_cmd = 'wrongcommand'
if os.system(os_cmd) != 0:
raise Exception('wrongcommand does not exist')
except:
print("command does not work")

There are two problems in your code snippet. First of all, never just do try: ... except:, always be specific about which exception you want to handle. Otherwise, your program simply swallows any kind of error, also those that you do not expect. In most cases, this will lead to unexpected behavior at some other point during runtime.
Furthermore, os.system() calls should most of the time be replaced by their counterparts from the subprocess module.
To see what goes wrong, leave out the try/except block and actually look at the traceback/exception. As others have pointed out, you will notice that there is no exception in your case which is why your custom string is not printed.
Bottom line: think about which specific exceptions can occur in your code block. Think hard about which of them you expect to happen for certain reasons and handle those appropriately. Do not handle those that you do not expect.

wrongcommand: command not found is the output of the shell os.system is using to invoke the command. os.system did not throw an exception
EDIT: edited by copy-and-pasting part of mgilson's comment

There is one more easiest ways is:
import os
def dat():
if os.system('date') == 0:
print("Command successfully executed")
else:
print("Command failed to execute")
dat()

Python Error Handling while base decoding

I tried to do the following :
I have read a code in base64 via QR Code and then I converted it.
If I get an error while I do the convert, I will write a error variable to 1 and then continue without exiting the program.
I don't find a solution for me. Did anyone has an idea how I can handle it?
I tried it with the Python Try Command but I didn't get it working or I have done something wrong.
here is a snip of my code:
secure = base64.b64decode(secure_base).decode("utf-8", "ignore")
number = base64.b64decode(number_base).decode("utf-8", "ignore")
start = int(base64.b64decode(start_base).decode("utf-8", "ignore"))
end = int(base64.b64decode(end_base).decode("utf-8", "ignore"))
thanks a lot.

You can use the try and Except in python in the following manner.
try:
"""some intelligent program here, which some times may FOOBAR"""
except Exception as e:
error_recieved = e
"""Do whatever you want here incase of an error"""
Remember that the program in try skips to except just after the line in which the error/exception occured.

Message: Tried to run command without establishing a connection

New to this, apologies for the novice question.
Trying to run a script using Python, Selenium and the unittest module. Have the typical setUp(), test_1, test_2, tearDown() method structure. Since I've added in more than one test, I get the following error:
selenium.common.exceptions.InvalidSessionIdException: Message: Tried to run command without establishing a connection
How can I resolve this?
I have looked into similar problems people have been facing with this issue, but in almost all cases the issue is not related to anything I am encountering (cronjobs for example)
My program looks like this...
class MyTest(unittest.TestCase):
#classmethod
def setUpClass(cls):
#my setup code here...
cls.driver = webdriver.Firefox(executable_path='my_gecko_driver')
cls.driver.get('www.my_url.com')
cls.driver...... # various other tasks
def test_1(self):
# my test code here....
foo = self.driver.find_element_by_xpath('/button_elem/')
foo.click()
# etc etc....
def test_2(self):
# my test code here....
bar = self.driver.find_element_by_xpath('/button_elem/')
bar.click()
# etc etc....
#classmethod
def tearDown(cls):
print('Entered tearDown function.')
# close the browser window
cls.driver.close()
if __name__ == '__main__':
unittest.main()
Before I added the second test, the test ran successfully.
Now I am getting the error:
selenium.common.exceptions.InvalidSessionIdException: Message: Tried to run command without establishing a connection
I suspect this is to do with the tearDown method perhaps not working correctly? However I thought this method was called at the end of every test_x upon finishing.
I have also noticed that Pycharm is highlighting 'driver' in the line 'cls.driver.close()' which I am also not too sure about. It says unresolved attribute reference' however is this not created in the setUp() method?

Try switching explicitly between tabs before closing them.
main_page = driver.window_handles[0]
driver.switch_to.window(main_page)

this is because multiple browser sessions are opened at your machine.
if you are on linux run the command
killall firefox
and try to run your script again. This should fix error for you.

Name error when calling defined function in Jupyter

I am following a tutorial over at https://blog.patricktriest.com/analyzing-cryptocurrencies-python/ and I've got a bit stuck. I am tyring to define, then immediately call, a function.
My code is as follows:
def merge_dfs_on_column(dataframes, labels, col):
'''merge a single column of each dataframe on to a new combined dataframe'''
series_dict={}
for index in range(len(dataframes)):
series_dict[labels[index]]=dataframes[index][col]
return pd.DataFrame(series_dict)
# Merge the BTC price dataseries into a single dataframe
btc_usd_datasets= merge_dfs_on_column(list(exchange_data.values()),list(exchange_data.keys()),'Weighted Price')
I can clearly see that I have defined the merge_dfs_on_column fucntion and I think the syntax is correct, however, when I call the function on the last line, I get the following error:
NameError Traceback (most recent call last)
<ipython-input-22-a113142205e3> in <module>()
1 # Merge the BTC price dataseries into a single dataframe
----> 2 btc_usd_datasets= merge_dfs_on_column(list(exchange_data.values()),list(exchange_data.keys()),'Weighted Price')
NameError: name 'merge_dfs_on_column' is not defined
I have Googled for answers and carefully checked the syntax, but I can't see why that function isn't recognised when called.

Your function definition isn't getting executed by the Python interpreter before you call the function.
Double check what is getting executed and when. In Jupyter it's possible to run code out of input-order, which seems to be what you are accidentally doing. (perhaps try 'Run All')

Well, if you're defining yourself,
Then you probably have copy and pasted it directly from somewhere on the web and it might have characters that you are probably not able to see.
Just define that function by typing it and use pass and comment out other code and see if it is working or not.

"run all" does not work.
Shutting down the kernel and restarting does not help either.
If I write:
def whatever(a):
return a*2
whatever("hallo")
in the next cell, this works.

I have also experienced this kind of problem frequently in jupyter notebook
But after replacing %% with %%time the error resolved. I didn't know why?
So,after some browsing i get that this is not jupyter notenook issue,it is ipython issueand here is the issue and also this problem is answered in this stackoverflow question

AssertionError when using self-defined nested list in Pyspc

I installed pyspc and run on Jupyter Notebook successfully when using original samples.
But when I tried introducing a self defined nested list and an error message showed up.
pyspc library: https://github.com/carlosqsilva/pyspc
from pyspc import*
import numpy
abc=[[2,3,4],[4,5.6],[1,4,5],[3,4,4],[4,5,6]]
a=spc(abc)+xbar_rbar()+rules()+rbar()
print(a)
error message for AssertionError
Thank you for advise where went wrong and how to fix it.

Check the data you have accidentally used the . instead of , for value [4,5.6], second element of the list.
Here is the corrected data
abc=[[2,3,4],[4,5,6],[1,4,5],[3,4,4],[4,5,6]]
Hope this will help.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Use Case Apache Spark getting Nameerror running a line with countryCodeMap - apache-spark

Related

Except python does not catch Windows error FileNotFoundError | "The system cannot find the specified path." [duplicate]

Python Error Handling while base decoding

Message: Tried to run command without establishing a connection

Name error when calling defined function in Jupyter

AssertionError when using self-defined nested list in Pyspc

Categories

Resources