Pandas for Excel and selenium loop - python-3.x

I am trying to print out values from excel, and values are in numbers. I goal is to read these values and search in google one by one. Will stop for x seconds when the value is 'nan', then skip this 'nan' and then keep moving on to next.
Problems faced:
It is printing out in scientific notation format
Want to stop doing something when its 'nan' in excel
Copy UPC[i] into google search, but i wanted to only copy once, due to i want to design it open new tab then copy the second UPC[i]
My solution:
I have 'lambda x: '%0.2f' % x' inside set_option to make it print out xxxxxx.00 with 2 decimal. Even i want it in int, but its already better than scientific notation format
Used 'if' to see if value in upc[i] equal to 'nan' <--nan is what i got from print. But it still print out range of 20 values with 'nan'.
I can't think of something now
Code:
import pandas as pd
import numpy as np
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.keys import Keys
from selenium.webdriver import ActionChains
import msvcrt
import datetime
import time
driver = webdriver.Chrome()
#Settings
pd.set_option('display.width',10, 'display.max_rows', 10, 'display.max_colwidth',100, 'display.width',10, 'display.float_format', lambda x: '%0.2f' % x)
df = pd.read_excel(r"BARCODE.xlsx", skiprows = 2, sheet_name = 'products id')
#Unnamed: 1 is also an empty column, i just didn't input UPC as title in excel.
upc = df['Unnamed: 1']
#I can't print out as interger...It will always have a xxxxx.0
print((upc[0:20]))
count = len(upc)
i = 0
for i in range(count ):
if upc[i] == 'nan':
'skip for x seconds and continue, i am not sure how to do yet'
else:
print(int(upc[i]))
driver.get('https://www.google.com')
driver.find_element_by_name('q').send_keys(int(upc[i]))
i = i + 1
Print out:
3337872411991.0
3433422408159.0
3337875598071.0
3337872412516.0
3337875518451.0
3337875613491.0
3337872413025.0
3337875398961.0
3337872410208.0
nan <- i want program to stop here so i can do something else.
3337872411991.0
3433422408159.0
3337875598071.0
3337872412516.0
3337875518451.0
3337875613491.0
3337872413025.0
3337875398961.0
3337872410208.0
nan
Name: Unnamed: 1, Length: 20, dtype: float64
3337872411991
3433422408159
3337875598071
3337872412516
3337875518451
etc....
Googled some format about number, such as set printing format, but I got confused between .format and lambda.

It is printing out in scientific notation format
It seems you have numbers like UPC and EANs.
You can probably solve that by marking numbers as text instead. If you need to have always length 13 you can correct it with appending zeroes at start.
Want to stop doing something when its nan in excel
Simplest solution could be to use input and accept any character to continue executing your code. But if you want to have few seconds time.sleep() is good as well
Copy UPC[i] into google search, but i wanted to only copy once, due to i want to design it open new tab then copy the second UPC[i]
Some points you may want to reconsider:
Iterating in python can be done with enumerate() if you need index values. If you do not need index you may simply drop it instead. for value in data_frame['UPC']:
With selenium you can directly scrape results instead of using new tabs.
Below you can check out working example (at least on my machine with python3, w10 and chrome exe driver).
import pandas as pd
from time import sleep
from selenium import webdriver
from selenium.webdriver import ActionChains
from selenium.webdriver.common.keys import Keys
# Settings
pd.set_option('display.width', 10, 'display.max_rows', 10, 'display.max_colwidth', 100, 'display.width', 10,
'display.float_format', lambda x: '%0.2f' % x)
data_frame = pd.read_excel('test.xlsx', sheet_name='products id', skip_blank_lines=False)
# I have chrome driver in exe, so this is how I need to inject it to get driver out
driver = webdriver.Chrome('chromedriver.exe')
google = 'https://www.google.com'
for index, value in enumerate(data_frame['UPC']): # named the column in excel file
if pd.isna(value):
print('{}: zzz'.format(index))
sleep(2) # will sleep for 2 seconds, use input() if you want to wait indefinitely instead
else:
print('{}: {} {}'.format(index, value, type(value)))
# since given values are float, you can convert it to int
value = int(value)
driver.get(google)
google_search = driver.find_element_by_name('q')
google_search.send_keys(value)
google_search.send_keys('\uE007') # this is "ENTER" for committing your search in google or Keys.ENTER
sleep(0.5)
# you may want to wait a bit before page loads fully, then scrape info you want
# also consider using try-except blocks if something unexpected happens
# if you want to open new tab (windows + chrome driver)
# open a link in a new window - workaround
helping_link = driver.find_element_by_link_text('Help')
actions = ActionChains(driver)
actions.key_down(Keys.CONTROL).click(helping_link).key_up(Keys.CONTROL).perform()
driver.switch_to.window(driver.window_handles[-1])
# close your instance of chrome driver or leave it if you need your tabs
# driver.close()

check this post
if upc[i].isnull():
time.sleep(3)
check out this post, which boils down to:
driver.execute_script("window.open('https://www.google.com');")
driver.switch_to.window(driver.window_handles[-1])

Related

Selenium, Python - Use For-loop in finding values in a column, but resulted in duplicating first values instead

Above is the practice website I am attempting to extract only the name column. However, from my For-loop, I am repeatedly accounting the name Alan five times (see img below). Unfortunately, my For-loop counted the header row too as row 0.
Code:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
#Scenario a HTML table is provided
#Feature: HTML table
def test_table():
URL = 'https://testpages.herokuapp.com/styled/tag/table.html'
#Given website loaded a table <--The Set Up
b=webdriver.Chrome()
b.get(URL)
wait=WebDriverWait(b,20)
#When user arrives at the website and sees a table <-- IGNORE, when print text, all values are correct.
table=b.find_element(By.ID,'mytable')
#Then the table shows a column of names <-- The issue
rows=b.find_elements(By.TAG_NAME,'tr')
for i, col in enumerate(rows):
col=b.find_element(By.TAG_NAME,'td').text
print(f'Row: {i}, Name: {row}')
b.quit()
test_table()
Result:
I had thought of using the expected condition, but the table is static. Which led to me deciding that the use of expected condition wasn't necessary. Also, I had considered that the tag name wasn't used correctly, but this wasn't the case. Any assistant is appreciated.
You are traversing the whole page instead of just the rows. And since you are searching for single element in the loop, it returns the first td element of the page. It should be
for i, col in enumerate(rows):
name = col.find_element(By.TAG_NAME,'td').text
print(f'Row: {i}, Name: {name}')

ipython: print numbers with thousands separator

I am using ipython 5.8.0 on Debian 10.
This is how output looks like:
In [1]: 50*50
Out[1]: 2500
Is it possible to configure ipython to print all numbers with thousands separators? ie:
In [1]: 50*50
Out[1]: 2'500
In [2]: 5000*5000
Out[2]: 25'000'000
And perhaps, is it possible to make ipython also understand thousands separators on input?
In [1]: 5'000*5'000
Out[1]: 25'000'000
UPDATE
The accepted answer from #Chayim Friedman works for integers, but does not work for float:
In [1]: 500.1*500
Out[1]: 250050.0
Also, when it works, it uses , as the character for thousand separator:
In [1]: 500*500
Out[1]: 250,000
Can I use ' instead?
Using ' as thousands separator in input is quite problematic because Python uses ' to delimit strings, but you can use _ (PEP 515, Underscores in Numeric Literals):
Regarding output, this is slightly harder, but can be done using IPython extensions.
Put the following Python code in a new file at ~/.ipython/extensions/thousands_separator.py:
default_int_printer = None
def print_int(number, printer, cycle):
printer.text(f'{number:,}') # You can use `'{:,}'.format(number)` if you're using a Python version older than 3.6
def load_ipython_extension(ipython):
global default_int_printer
default_int_printer = ipython.display_formatter.formatters['text/plain'].for_type(int, print_int)
def unload_ipython_extension(ipython):
ipython.display_formatter.formatters['text/plain'].for_type(int, default_int_printer)
This code tells IPython to replace the default int formatter with one that prints thousand separators when this extension is loaded, and restore the original when it is unloaded.
Edit: If you want a different separator, for instance ', replace the f'{number:,}' with f'{number:,}'.replace(',', "'").
You can load the extension using the magic command %load_ext thousands_separator and unload it using %unload_ext thousands_separator, but if you want it always, you can place it in the default profile.
Run the following code in the terminal:
ipython3 profile create
It will report that a file ~/.ipython/profile_default/ipython_config.py was created. Enter it, and search for the following string:
## A list of dotted module names of IPython extensions to load.
#c.InteractiveShellApp.extensions = []
Replace it with the following:
# A list of dotted module names of IPython extensions to load.
c.InteractiveShellApp.extensions = [
'thousands_separator'
]
This tells IPython to load this extension by default.
Done!
Edit: I saw that you want to a) use ' as separator, and b) do the same for floats:
Using different separator is quite easy: just str.replace():
def print_int(number, printer, cycle):
printer.text(f'{number:,}'.replace(',', "'"))
Doing the same for floats is also easy: just setup print_int so it prints floats to. I also suggest to change the name to print_number.
Final code:
default_int_printer = None
default_float_printer = None
def print_number(number, printer, cycle):
printer.text(f'{number:,}'.replace(',', "'"))
def load_ipython_extension(ipython):
global default_int_printer
global default_float_printer
default_int_printer = ipython.display_formatter.formatters['text/plain'].for_type(int, print_number)
default_float_printer = ipython.display_formatter.formatters['text/plain'].for_type(float, print_number)
def unload_ipython_extension(ipython):
ipython.display_formatter.formatters['text/plain'].for_type(int, default_int_printer)
ipython.display_formatter.formatters['text/plain'].for_type(float, default_float_printer)
After update: you can subclass int:
class Int(int):
def __repr__(self):
return "{:,}".format(self)
Int(1000)
# 1,000
I don't believe you can achieve all that you are looking for without rewriting the iPython interpreter, which means changing the Python language specification, to be able to input numbers with embedded ' characters and have them ignored. But you can achieve some of it. Subclassing the int class is a good start. But you should also overload the various operators you plan on using. For example:
class Integer(int):
def __str__(self):
# if you want ' as the separator:
return "{:,}".format(self).replace(",", "'")
def __add__(self, x):
return Integer(int(self) + x)
def __mul__(self, x):
return Integer(int(self) * x)
"""
define other operations: __sub__, __floordiv__, __mod__, __neg__, etc.
"""
i1 = Integer(2)
i2 = Integer(1000) + 4.5 * i1
print(i2)
print(i1 * (3 + i2))
Prints:
1'009
2'024
Update
It seems that for Python 3.7 you need to override the __str__ method rather than the __repr__ method. This works for Python 3.8 and should work for later releases as well.
Update 2
import locale
#locale.setlocale(locale.LC_ALL, '') # probably not required
print(locale.format_string("%d", 1255000, grouping=True).replace(",", "'"))
Prints:
1'255'000
An alternative if you have package Babel from the PyPi repository:
from babel import Locale
from babel.numbers import format_number
locale = Locale('en', 'US')
locale.number_symbols['group'] = "'"
print(format_number(1255000, locale='en_US'))
Prints:
1'255'000
Or if you prefer to custom-tailor a locale just for this purpose and leave the standard en_US locale unmodified. This also shows how you can parse input values:
from copy import deepcopy
from babel import Locale
from babel.numbers import format_number, parse_number
my_locale = deepcopy(Locale('en', 'US'))
my_locale.number_symbols['group'] = "'"
print(format_number(1255000, locale=my_locale))
print(parse_number("1'125'000", locale=my_locale))
Prints:
1'255'000
1125000
Based on PEP-0378, you can use the following code:
a = 1200
b = 500
c = 10
#res = a
#res = a*b
res = a*b*c
dig = len(str(res)) # to figure out how many digits are required in result
print(format(res, "{},d".format(dig)))
It will produce:
6,000,000

Creating P/L Column in Pandas from and Open and Close Price based on whether is a long or short position

This one has been bothering me for awhile. I have all the pieces (I think) that work individually to create the output I'm looking for (calculate a profit and loss for a stock), but when put together they return nothing.
The dataframe itself is pretty self-explanatory so I haven't included an example. Basically the series includes Stock Symbol, Opening Time, Opening Price, Closing Time, Closing Price, and whether or not it was a long or short position.
Here's my code to calculate the P-L for a long position:
import pandas as pd
from yahoo_fin import stock_info as si
from datetime import datetime, timedelta, date
import time
def create_df3():
return pd.read_excel('Base_Sheet.xlsx', sheet_name="Closed_Pos", header=0)
def update_price(sym):
return si.get_live_price(sym)
long_pl_calc = ((df3['Close_Price']) / (df3['Entry_Price'])) - 1
close_long_pl = df3['P-L'].isnull and (df3['Long_Short'] == 'Long')
for row in df3.iterrows():
if close_long_pl is True:
return df3['P-L'].apply(long_pl_calc)
If I print long_pl_calc or close_long_pl, I get exactly what I expect. However, when I iterate through the series to return the calculation, I still end up with a 'NaN' value (but not an error).
Any help would be appreciated! I already know the solution I came to is terrible, but I've also tried at least a dozen other iterations with no success either.
Create a column df3['Long'] with 1 for the date you are long and 0 for the rest, then to have your long P&L (you could do the same for the short but don't forget to take the opposite sign of the daily return) you can do :
df['P&L Long'] = ((df3['Close_Price'] / df3['Entry_Price']) - 1) * df['Long']
Then for your df3['P-L'] it will be:
df['P-L'] = df['P&L Long'] + df['P&L Short']

Compare Percent Change values inside a for loop

I have a list like below
I want to be able to compare the percent change of QQQ at 9:35 to various other stocks like AAPL and AMD at the same time. So check if percent change of AAPL at 9:35 is greater than percent change of QQQ at 9:35. Same thing for AMD at 9:35 and then at 9:40 and then 9:45 and so on.
I want to do this via python
This is what i have so far but not quite correct
import pandas as pd
import time
import yfinance as yf
import datetime as dt
from pandas_datareader import data as pdr
from collections import Counter
from tkinter import Tk
from tkinter.filedialog import askopenfilename
import os
from pandas import ExcelWriter
d1 = dt.datetime(2020, 8, 5,9,00,00)
d2 = dt.datetime(2020, 8, 5,16,00,00)
pc=Counter()
filePath=r"C:\Users\Adil\Documents\Python Test - ET\Data\Trail.xlsx"
stocklist = pd.read_excel(filePath)
for i in stocklist.index:
symbol=stocklist['Symbol'][i]
date=stocklist['Date'][i]
close=stocklist['Close'][i]
pc=stocklist['PercentChange'][i]
if (pc[i]>pc['QQQ']):
print(pc[i])
Alright,
I got from a comment, an explanation of what the OP wants:
Yes so i want to see if within a given time 5 min time period if a
stock performed better than QQQ
First thing you need to do, is make it so you can look up your information by the time and symbol. Here is how I would do that:
my_data = {}
for i in stocklist.index:
symbol=stocklist['Symbol'][i]
date=stocklist['Date'][i]
pc=stocklist['PercentChange'][i]
my_data[symbol, date] = pc
That makes a dictionary where you can lookup percent changes by calling my_data['ABCD', 'datetime']
Then, I would make a list of all the times.
time_set = set()
for i in stocklist.index:
date = stocklist['Date'][i]
time_set.add(date)
times = list(time_set)
times.sort()
If you are pressed for computer resources, you could combine those two loops and run them together, but I think having them separate makes the code easier to understand.
And then do the same thing for symbols:
sym_set = set()
for i in stocklist.index:
date = stocklist['Symbol'][i]
sym_set.add(date)
symbols = list(sym_set)
symbols.sort()
Once again, you could have made this set during the first for-loop, but this way you can see what we are trying to accomplish a bit better.
Last thing to do, is actually make the comparisons:
for i in times:
qs = my_data['QQQ', i]
for j in symbols:
if qs != 'QQQ':
which = "better" if my_data[j, i]>qs else "worse"
print(j + " did " + which + " than QQQ at " + i)
Now, this just prints the information out to the console, you should replace the print command with however you want to output it. (Also, I assumed higher was better; I hope that was right.)

How to get rid of value 'None' after printing a fucntion with the Try and Except block

I've written a function like the following, it's pretty self-explaining and i can't properly summarize my task. So, the problem here is that my Try and Except block keep producing value 'None', which really screw me up in my next task when i tried to put it in an array and covnert it into a numpy array. Data_dict is a dictionary contains every attribute (as keys) of a data file i'm working on in this task.
TLDR: how can i stop the try and except block from producing value "None" or is there another way to execute my task.
I'm just 4 weeks into python and have no previous coding experience. Also, i'm using Jupyter Notebook. I've tried to add another Else block to get rid of the value but it just became worse.
import datetime
def compute_opening_duration(opening_time, closing_time):
#Input: two string: opening_time and closing_time
#Output: the opening duration in hours
#Return -1 if any time is in incorrect form.
str_format = "%H:%M:%S"
try:
a = datetime.datetime.strptime(closing_time, str_format) - datetime.datetime.strptime(opening_time, str_format)
print(a.total_seconds()/3600)
except ValueError:
print(-1)
print(compute_opening_duration('5:30:00', '16:00:00'))
#my 2nd task is to compile all the values of that function above and then put it an array
#then convert that into a numpy array and print out first 10 entries
a = list(compute_opening_duration(data_dict['Open'][i], data_dict['Close'][i]) for i in range (len(data_dict['Open'])))
a_numpyarray = np.asarray(a)
print(a_numpyarray[0:11])
i expected it to be numbers
but the actual output is: [None None None None None None None None None None None]
import datetime
import numpy as np
def compute_opening_duration(opening_time, closing_time):
#Input: two string: opening_time and closing_time
#Output: the opening duration in hours
#Return -1 if any time is in incorrect form.
str_format = "%H:%M:%S"
try:
a = datetime.datetime.strptime(closing_time, str_format) - datetime.datetime.strptime(opening_time, str_format)
return a.total_seconds()/3600
except ValueError:
return -1
# print(compute_opening_duration('5:30:00', '16:00:00'))
#my 2nd task is to compile all the values of that function above and then put it an array
#then convert that into a numpy array and print out first 10 entries
data_dict={
"Open" :['5:30:00'],
"Close": ['16:00:00']
}
a = list(compute_opening_duration(data_dict['Open'][i], data_dict['Close'][i]) for i in range (len(data_dict['Open'])))
a_numpyarray = np.asarray(a)
print(a_numpyarray[0:11]) #[10.5]
Remove print from the last line. Just keep this :
compute_opening_duration('5:30:00', '16:00:00')

Resources