Add a unique timestamp to a text using Selenium and Python3 - python-3.x

Quick question which I thought I'd find a simple answer to but so far, no luck.
I'm creating automation to go through sign up forms and for this I ideally need to send some unique text. I was thinking of something like name+timestamp (as do-able on Ghost Inspector).
Currently I'm okay just writing a quick unique name each time in my code (using send.keys('')), just looking to see if there's a way to cut this small chore out really.

To add a unique timestamp to a text you can use Python's strftime() method from time module as follows:
Code Block:
from selenium import webdriver
from time import gmtime, strftime
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument('disable-infobars')
driver=webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get("http://www.google.com")
driver.find_element_by_name("q").send_keys("James Stott{}".format(strftime("%Y-%m-%d %H:%M:%S", gmtime())))
Browser Snapshot:

You can use this code to generate random Strings which you can send it to sendKeys("") command.
import string
import random
def random_string(length):
return ''.join(random.choice(string.ascii_letters) for m in range(length))
print random_string(10)
print random_string(5)
If you want to use UUID :
import uuid
id = uuid.uuid1()
print (id.hex) // for hex representation
print (id.int)
If you want to use timestamp, then you can use this code :
import datetime
print('Date now: %s' % datetime.datetime.now())
here is the reference link : datetime python

Related

Scrape Product Image with BeautifulSoup (Error)

I need your help. I'm working on a telegram bot which sends me all the sales from amazon.
It works well but this function doesn't work properly. I have always the same error that, however, blocks the script
imgs_str = img_div.img.get('data-a-dynamic-image') # a string in Json format
AttributeError: 'NoneType' object has no attribute 'img'
def take_image(soup):
img_div = soup.find(id="imgTagWrapperId")
imgs_str = img_div.img.get('data-a-dynamic-image') # a string in Json format
# convert to a dictionary
imgs_dict = json.loads(imgs_str)
#each key in the dictionary is a link of an image, and the value shows the size (print all the dictionay to inspect)
num_element = 0
first_link = list(imgs_dict.keys())[num_element]
return first_link
I still don't understand how to solve this issue.
Thanks for All!
From the looks of the error, soup.find didn't work.
Have you tried using images = soup.findAll("img",{"id":"imgTagWrapperId"})
This will return a list
Images are not inserted in HTML Page they are linked to it so you need wait until uploaded. Here i will give you two options;
1-) (not recommend cause there may be a margin of error) simply; you can wait until the image is loaded(for this you can use "time.sleep()"
2-)(recommend) I would rather use Selenium Web Driver. You also have to wait when you use selenium, but the good thing is that selenium has a unique function for this job.
I will show how make it with selenium;
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
browser = webdriver.Chrome()
browser.get("url")
delay = 3 # seconds
try:
myElem = WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.ID, 'imgTagWrapperId')))# I used what do you want find
print ("Page is ready!")
except TimeoutException:
print ("Loading took too much time!")
More Documention
Code example for way 1
Q/A for way 2

Understanding argparse to get dynamic maps with Geo-Location of tweets

I have found this python code online (twitter_map_clustered.py) which (I think) help create a map using the geodata of different tweets.:
from argparse import ArgumentParser
import folium
from folium.plugins import MarkerCluster
import json
def get_parser():
parser = ArgumentParser()
parser.add_argument('--geojson')
parser.add_argument('--map')
return parser
def make_map(geojson_file, map_file):
tweet_map = folium.Map(Location=[50, 5], max_zoom=20)
marker_cluster = MarkerCluster().add_to(tweet_map)
geodata= json.load(open(geojson_file))
for tweet in geodata['features']:
tweet['geometry']['coordinates'].reverse()
marker = folium.Marker(tweet['geometry']['coordinates'], popup=tweet['properties']['text'])
marker.add_to(marker_cluster)
#Save to HTML map file
tweet_map.save(map_file)
if __name__ == '__main__':
parser = get_parser()
args = parser.parse_args()
make_map(args.geojson, args.map)
I managed to extract the geo information of different tweets and save it into a geo_data.json file. However, I have trouble understanding the code, specially the function def get_parser().
It seems that we need to add argument when running the file in the command prompt. The argument should be geo_data.json. However, it is also asking for a map ? parser.add_argument('--map')
Why is it the case? In the code, aren't we creating the map here?
#Save to HTML map file
tweet_map.save(map_file)
Can you please help me. How would you run the python script ? Is there anything important I am missing ?
As explained by argparse documentation, it simply asks for the name of the geojson file and a name that your code will use to save the map.
Therefore, you will run:
python twitter_map_clustered.py --geojson geo_data.json --map mymap.html
and you will get a map named mymap.html.

How to extract image src from the website

I tried scraping the table rows from the website to get the data on corona virus spread.
I wanted to extract the src for all the tags so as to get the source of the flag's image along with all the data for each country. Could someone help ?
import pandas as pd
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
options = Options()
options.add_argument('--headless')
driver = webdriver.Firefox(options=options)
driver = webdriver.Firefox(options=options)
driver.get("https://google.com/covid19-map/?hl=en")
df = pd.read_html(driver.page_source)[1]
df.to_csv("Data.csv", index=False)
driver.quit()
While Gareth's answer has already been accepted, his answer inspired me to write this one form a pandas point of view. Since we know the url for flags are a fixed pattern and the only thing that changes is the name. We can create a new column by lowercasing the name, replacing spaces with underscores and then weaving the name in the fixed URL pattern
import pandas as pd
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
options = Options()
options.add_argument('--headless')
driver = webdriver.Chrome()
driver.get("https://google.com/covid19-map/?hl=en")
df = pd.read_html(driver.page_source)[1]
df['flag_url'] = df.apply(lambda row: f"https://www.gstatic.com/onebox/sports/logos/flags/{row.Location.lower().replace(' ', '_')}_icon_square.svg", axis=1)
df.to_csv("Data.csv", index=False)
driver.quit()
OUTPUT SAMPLE
Location,Confirmed,Cases per 1M people,Recovered,Deaths,flag_url
Worldwide,882068,125.18,185067,44136,https://www.gstatic.com/onebox/sports/logos/flags/worldwide_icon_square.svg
United Kingdom,29474,454.19,135,2352,https://www.gstatic.com/onebox/sports/logos/flags/united_kingdom_icon_square.svg
United States,189441,579.18,7082,4074,https://www.gstatic.com/onebox/sports/logos/flags/united_states_icon_square.svg
Not the most genious way, but since you have the page source already, how about using regex to match the urls of the images?
import re
print (re.findall(r'https://www.gstatic.com/onebox/sports/logos/flags/.+?.svg', driver.page_source))
The image links are in order so it matches the order of confirmed cases - except that on my computer, the country I'm in right now is at the top of the list.
If this is not what you want, I can delete this answer.
As mentioned by #Chris Doyle in the comments, this can even simply done by noticing the urls are the same, with ".+?" replaced by the country's name (all lowercase, connected with underscores). You have that information in the csv file.
country_name = "United Kingdom"
url = "https://www.gstatic.com/onebox/sports/logos/flags/"
url += '_'.join(country_name.lower().split())
url += '.svg'
print (url)
Also be sure to check out his answer using purely panda :)

Using Python, Selenium, and BeautifulSoup to scrape for content of a tag?

Relatively beginner. There are similar topics to this but I can see how my solution works, I just need help connecting these last few dots. I'd like to scrape follower counts from Instagram without the use of the API. Here's what I have so far:
Python 3.7.0
from selenium import webdriver
from bs4 import BeautifulSoup
driver = webdriver.Chrome()
> DevTools listening on ws://.......
driver.get("https://www.instagram.com/cocacola")
soup = BeautifulSoup(driver.page_source)
elements = soup.find_all(attrs={"class":"g47SY "})
# Note the full class is 'g47SY lOXF2' but I can't get this to work
for element in elements:
print(element)
>[<span class="g47SY ">667</span>,
<span class="g47SY " title="2,598,456">2.5m</span>, # Need what's in title, 2,598,456
<span class="g47SY ">582</span>]
for element in elements:
t = element.get('title')
if t:
count = t
count = count.replace(",","")
else:
pass
print(int(count))
>2598456 # Success
Is there any easier, or quicker way to get to the 2,598,456 number? My original hope was that I could just use the class of 'g47SY lOXF2' but spaces in the class name aren't functional in BS4 as far as I'm aware. Just want to make sure this code is succinct and functional.
I had to use headless option and added executable_path for testing. You can remove that.
from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("--headless")
driver = webdriver.Chrome(executable_path="chromedriver.exe",chrome_options=options)
driver.get('https://www.instagram.com/cocacola')
soup = BeautifulSoup(driver.page_source,'lxml')
#This will give you span that has title attribute. But it gives us multiple results
#Follower count is in the inner of a tag.
followers = soup.select_one('a > span[title]')['title'].replace(',','')
print(followers)
#Output 2598552
You could use regular expression to get the number.
Try this:
import re
fallowerRegex = re.compile(r'title="((\d){1,3}(,)?)+')
fallowerCount = fallowerRegex.search(str(elements))
result = fallowerCount.group().strip('title="').replace(',','')

How do I relocate/disable GeckoDriver's log file in selenium, python 3?

Ahoy, how do I disable GeckoDriver's log file in selenium, python 3?
If that's not possible, how do I relocate it to Temp files?
To relocate the GeckoDriver logs you can create a directory within your project space e.g. Log and you can use the argument log_path to store the GeckoDriver logs in a file as follows :
from selenium import webdriver
driver = webdriver.Firefox(executable_path=r'C:\path\to\geckodriver.exe', log_path='./Log/geckodriver.log')
driver.get('https://www.google.co.in')
print("Page Title is : %s" %driver.title)
driver.quit()
ref: 7. WebDriver API > Firefox WebDriver
according to the documents, you can relocate it to Temp following:
from selenium import webdriver
from selenium.webdriver.firefox.options import Options$
import os
options = Options()
driver = webdriver.Firefox(executable_path=geckodriver_path, service_log_path=os.path.devnull, options=options)
Following argument are deprecated:
firefox_options – Deprecated argument for options
log_path – Deprecated argument for service_log_path
using WebDriver(log_path=path.devnull) and WebDriver(service_log_path=path.devnull are both deprecated at this point in time, both result in a warning.
using a service object is now the prefered way of doing this:
from os import path
from selenium.webdriver.firefox.service import Service
from selenium.webdriver.firefox.webdriver import WebDriver
service = Service(log_path=path.devnull)
driver = WebDriver(service=service)
driver.close()
You should be using the service_log_path, as of today the log_path is deprecated, example with pytest:
#pytest.mark.unit
#pytest.fixture
def browser(pytestconfig):
"""
Args:
pytestconfig (_pytest.config.Config)
"""
driver_name = pytestconfig.getoption('browser_driver')
driver = getattr(webdriver, driver_name)
driver = driver(service_log_path='artifacts/web_driver-%s.log' % driver_name)
driver.implicitly_wait(10)
driver.set_window_size(1200, 800)
yield driver
driver.quit()
No #hidehara, but I found a way how to do it. I looked up the file init in the Selenium2Library directory. in my case: C:\Users\Eigenaardig\AppData\Local\Programs\Python\Lib\site-packages\SeleniumLibrary
there I added these 2 lines...
from selenium import webdriver
driver = webdriver.Firefox(executable_path=r'C:\Users\Eigenaar\eclipse-workspace\test\test\geckodriver.exe', log_path='./Log/geckodriver.log')
created the directory LOG (in Windows Explorer)
helaas, that started 2 instances.
I added in a separate library (.py file)
which looks like this (for test purposes):
import time
import random
from selenium import webdriver
driver = webdriver.Firefox(executable_path=r'C:\Users\specimen\RobotFrameWorkExperienced\RobotLearn\Log\geckodriver.exe', service_log_path='./Log/geckodriver.log')
class CustomLib:
ROBOT_LIBRARY_SCOPE = 'RobotLearn'
num = random.randint(1, 18)
if num % 2 == 0:
def get_current_time_as_string(self):
localtime = time.localtime()
formatted_time = time.strftime("%Y%m%d%H%M%S", localtime)
return formatted_time
else:
def get_current_time_as_string(self):
localtime = time.localtime()
formatted_time = time.strftime("%S%m%d%H%M%Y", localtime)
return formatted_time
But now it opens up 2 instances,
1 runs correct,
1 stays open and does nothing furthermore.
help help.
If it all for some reason does not work. (which was the case in our case).
then go to this (relative) directory:
C:\Users\yourname\AppData\Local\Programs\Python\Python38\Lib\site-packages\SeleniumLibrary\keywords\webdrivertools
there is a file called: webdrivertools.py
on line 157 you can edit
service_log_path='./robots/robotsiot/Results/Results/Results/geckoresults', executable_path=executable_path,
advantages:
#1 if you're using something like Github and you synchronize a directory then the log files are kept separate.
#2 the original file of the previous run gets overwritten
(if that is what you want, but in some cases that is exactly how you need it).
note: the section written above is in case you're using FireFox, if you are using another browser you'll have to edit it on a different line.
note2: this path overrides on a high level, so arguments in Eclipse->Robot framework will not have any effect anymore.
use this option with caution: it's sort of a last resort if the other options don't work!

Resources