Web scraping a table from dynamic website (Python) - python-3.x

I am fairly new to web scraping and decided to dive straight into the deep end. I want select any product and "all months" in a dropdown above the table from https://www.cmegroup.com/tools-information/quikstrike/options-calendar.html and extract the table data into a scv file. The problem araises because the website is dynamic (not all HTML code is displayed when clicking inspect sourse in browser) and generates the table in css (from what i managed to understand). I tried using Selenium to load the webpage, but I am getting an error.
[12508:8412:0216/220631.827:ERROR:ssl_client_socket_impl.cc(985)] handshake failed; returned -1, SSL error code 1, net_error -101
I am assuming this has to do with the webdriver initialisation and I need to give it some settings, just not sure which ones.
Here is the code:
from selenium import webdriver
from bs4 import BeautifulSoup
# Set up the Selenium driver
driver = webdriver.Chrome()
# Open the webpage
url = 'https://www.cmegroup.com/tools-information/quikstrike/options-calendar.html'
driver.get(url)
# Render the page and extract the HTML code
html = driver.page_source
# Parse the HTML using BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
# Extract the data you want from the soup object
tables = soup.findAll("table")
print(tables)
# Close the Selenium driver
driver.quit()
I have tried going the short route and reproducing the requests made by the browser and catching the response with the HTML code (yes the request only returns HTML, not JSON), but this backfired as I couldnt reproduce payload. How do I get the data from the calendar?

The <table> element is within an <iframe> so to access/print the <table> contents you have to:
Induce WebDriverWait for the desired frame to be available and switch to it.
Induce WebDriverWait for the visibility_of_element_located.
You can use either of the following locator strategies:
driver.get('https://www.cmegroup.com/tools-information/quikstrike/options-calendar.html')
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#onetrust-accept-btn-handler"))).click()
WebDriverWait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"//iframe[#class='cmeIframe']")))
print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//div[#class='ui-widget-info clearfix']/table//tbody[not(#class)][.//tr[#class='group compact']]"))).text)
Console output:
FEBRUARY 2023
FIRST AVAIL DATE OPTION EXPIRATION DTE PRODUCT OPTION FUTURE FUTURE EXPIRATION DTE
20 Jan 2023 - Fri 17 Feb 2023 - Fri 1 Natural Gas Weekly Financial Option Week 3 LN3G3 NGH3 24 Feb 2023 - Fri 8
24 Nov 2010 - Wed 23 Feb 2023 - Thu 7 Natural Gas Option (European) LNEH3 NGH3 24 Feb 2023 - Fri 8
27 Jan 2023 - Fri 24 Feb 2023 - Fri 8 Natural Gas Weekly Financial Option Week 4 LN4G3 NGJ3 29 Mar 2023 - Wed 41
MARCH 2023
FIRST AVAIL DATE OPTION EXPIRATION DTE PRODUCT OPTION FUTURE FUTURE EXPIRATION DTE
03 Feb 2023 - Fri 03 Mar 2023 - Fri 15 Natural Gas Weekly Financial Option Week 1 LN1H3 NGJ3 29 Mar 2023 - Wed 41
10 Feb 2023 - Fri 10 Mar 2023 - Fri 22 Natural Gas Weekly Financial Option Week 2 LN2H3 NGJ3 29 Mar 2023 - Wed 41
21 Feb 2023 - Tue 17 Mar 2023 - Fri 29 Natural Gas Weekly Financial Option Week 3 LN3H3 NGJ3 29 Mar 2023 - Wed 41
27 Feb 2023 - Mon 24 Mar 2023 - Fri 36 Natural Gas Weekly Financial Option Week 4 LN4H3 NGJ3 29 Mar 2023 - Wed 41
24 Nov 2010 - Wed 28 Mar 2023 - Tue 40 Natural Gas Option (European) LNEJ3 NGJ3 29 Mar 2023 - Wed 41
06 Mar 2023 - Mon 31 Mar 2023 - Fri 43 Natural Gas Weekly Financial Option Week 5 LN5H3 NGK3 26 Apr 2023 - Wed 69
APRIL 2023
FIRST AVAIL DATE OPTION EXPIRATION DTE PRODUCT OPTION FUTURE FUTURE EXPIRATION DTE
13 Mar 2023 - Mon 06 Apr 2023 - Thu 49 Natural Gas Weekly Financial Option Week 1 LN1J3 NGK3 26 Apr 2023 - Wed 69
20 Mar 2023 - Mon 14 Apr 2023 - Fri 57 Natural Gas Weekly Financial Option Week 2 LN2J3 NGK3 26 Apr 2023 - Wed 69
27 Mar 2023 - Mon 21 Apr 2023 - Fri 64 Natural Gas Weekly Financial Option Week 3 LN3J3 NGK3 26 Apr 2023 - Wed 69
24 Nov 2010 - Wed 25 Apr 2023 - Tue 68 Natural Gas Option (European) LNEK3 NGK3 26 Apr 2023 - Wed 69
03 Apr 2023 - Mon 28 Apr 2023 - Fri 71 Natural Gas Weekly Financial Option Week 4 LN4J3 NGM3 26 May 2023 - Fri 99
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Reference
You can find a couple of relevant discussions in:
Switch to an iframe through Selenium and python
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element while trying to click Next button with selenium
selenium in python : NoSuchElementException: Message: no such element: Unable to locate element

Related

how to compare two lists of lists

I'm using groovy scripting in Soapui and I'm facing a comparison problem between two lists of lists:
on the one hand I have the expected value (type String), on the other I have the recovered value (type ArrayList).
expected value is like this :
expected [[hsb:[100, 100, 100], type:hsb, wt:null], [hsb:null, type:wt, wt:60]]
recovered one is like this :
recovered [[type:hsb, wt:null, hsb:[100, 100, 100]], [type:wt, wt:60, hsb:null]]
so basically it should match, however I can't figure out how to do it programmatically.
If I convert "expected" into an array, using expected_collection = expected_value.tokenize('[]'), I lose my elements and parsing the array gives the following :
Thu Nov 14 16:42:31 CET 2019: INFO: hsb:
Thu Nov 14 16:42:31 CET 2019: INFO: 100, 100, 100
Thu Nov 14 16:42:31 CET 2019: INFO: , type:hsb, wt:null
Thu Nov 14 16:42:31 CET 2019: INFO: ,
Thu Nov 14 16:42:31 CET 2019: INFO: hsb:null, type:wt, wt:60
Is it possible to tokenize on a limited level ? ie. only the first level of [] ?

Why Sorting the timestamp using sort_values is not working?

I have a column of timestamp converted to human readable form.
I have tried to sort it from epochtime as well as after converting. It's giving me
Fri, 08 Feb 2019 17:24:16 IST
Mon, 11 Feb 2019 02:19:40 IST
Sat, 09 Feb 2019 00:22:43 IST
which is not sorted.
I have used sort_values()
each_tracker_df = each_tracker_df.sort_values(["timestamp"],ascending=True)
why it isn't working??
Since all the time is in IST. Replace the string IST with NULL.
>>import datetime
>>times=['Fri, 10 Feb 2010 17:24:16','Fri, 11 Feb 2010 17:24:16','Fri, 11 Feb 2019 17:24:16']
>>change_format=[]
>> for time in times:
change_format.append(datetime.datetime.strptime(time, '%a, %d %b %Y %H:%M:%S'))
>>change_format.sort()

I want to find difference between 2 numbers stored in a file using shell script

Below is content of file. I want to find out difference between each line of first field.
0.607401 # Tue Mar 27 04:30:01 IST 2018
0.607401 # Tue Mar 27 04:35:02 IST 2018
0.606325 # Tue Mar 27 04:40:02 IST 2018
0.606223 # Tue Mar 27 04:45:01 IST 2018
0.606167 # Tue Mar 27 04:50:02 IST 2018
0.605716 # Tue Mar 27 04:55:01 IST 2018
0.605716 # Tue Mar 27 05:00:01 IST 2018
0.607064 # Tue Mar 27 05:05:01 IST 2018
output:-
0
-0.001076
-0.000102
.019944
..
..
.001348
CODE:
awk '{s=$0;getline;print s-$0;next}' a.txt
However this does not work as expected...
Could you help me please?
You can use the following awk code:
$ awk 'NR==1{save=$1;next}NR>1{printf "%.6f\n",($1-save);save=$1}' file
0.000000
-0.001076
-0.000102
-0.000056
-0.000451
0.000000
0.001348
and format the output as you want by modifying the printf.
The way you are currently doing will skip some lines!!!

time.mktime(datetime.timetuple()) seems behaving incorrectly

Hi I have time converted into gmt as below
2016-11-18 13:00:00+00:00
I want to convert this into millis which I am doing as below
epoch = int(time.mktime(datetime_in_gmt.timetuple()))
>>>print(epoch)
1479454200
and then when I do use this link http://www.epochconverter.com/ and paste this epoch i.e 1479454200 I get the result as
GMT: Fri, 18 Nov 2016 07:30:00 GMT
Your time zone: Friday 18 November 2016 01:00:00 PM IST GMT+5:30
I am not getting as why I am getting 18 Nov 7.30 as GMT because my GMT time was 2016-11-18 13:00:00+00:00 ?
any suggestion
Instead of timetuple use timestamp.
Something like
int(datetime_in_gmt.timestamp()) * 1000

Excel:Next year period

How to get next year period based on current month and year, for example:
Jan 2014 - Dec 2014
Feb 2014 - Jan 2015
Mar 2014 - Feb 2015
Apr 2014 - Mar 2015
May 2014 - Apr 2015
Jun 2014 - May 2015
Jul 2014 - Jun 2015
Aug 2014 - Jul 2015
Sep 2014 - Aug 2015
Oct 2014 - Sep 2015
Nov 2014 - Oct 2015
Dec 2014 - Nov 2015
Next period
Jan 2015 - Dec 2015
Feb 2015 - Jan 2016
etc.
I have tried with the following formula:
=UPPER(TEXT(NOW();"MMM")) &" "& TEXT(NOW();"YY")-1
It works fine for Jan 2014 but can't figure out how to get Dec 2014; Feb 2014 - Jan 2015 and so on?
I think you need the EOMonth formula.
=EOMONTH(NOW(),-13) +1 and =EOMONTH(NOW(),-2) +1 should give give you JAN 2014 to DEC 2014
from the MS Excel documentation
Microsoft Excel stores dates as sequential serial numbers so they can
be used in calculations. By default, January 1, 1900 is serial number
1, and January 1, 2008 is serial number 39448 because it is 39,448
days after January 1, 1900.
To get the text formatting you are after, I would suggest that you stick with formatting the cell/column as #Makyen has suggested. Having said that this is the formula that you can use to format the text.
=UPPER(TEXT(EOMONTH(NOW(),-13) +1, "MMM YY"))
Assuming that the date (as a date serial number) for which you desire to find the year period is in cell A1, the following should provide the next year period starting from that day:
=EOMONTH(A1,11) +DAY(A1) -1
Examples:
Input Output
1/18/2014 1/17/2015
2/18/2014 2/17/2015
3/18/2014 3/17/2015
4/18/2014 4/17/2015
5/18/2014 5/17/2015
6/18/2014 6/17/2015
7/18/2014 7/17/2015
8/18/2014 8/17/2015
9/18/2014 9/17/2015
10/18/2014 10/17/2015
11/18/2014 11/17/2015
12/18/2014 12/17/2015
1/18/2015 1/17/2016
2/18/2015 2/17/2016
3/18/2015 3/17/2016
4/18/2015 4/17/2016
5/18/2015 5/17/2016
6/18/2015 6/17/2016
7/18/2015 7/17/2016
8/18/2015 8/17/2016
9/18/2015 9/17/2016
10/18/2015 10/17/2016
11/18/2015 11/17/2016
12/18/2015 12/17/2016
1/18/2016 1/17/2017
If you want the year period to start from the current day:
=EOMONTH(NOW(),11) + DAY(NOW()) -1
If you want the year period to start from the first day of the current month:
=EOMONTH(EOMONTH(NOW(),-1) + 1,11)
or
=EOMONTH(NOW() - DAY(NOW()) + 1,11)
The EOMONTH() function:
EOMONTH(start_date,months)
Returns the serial number for the last day of the month that is the
indicated number of months before or after start_date. Use EOMONTH to
calculate maturity dates or due dates that fall on the last day of the
month.
If this function is not available, and returns the #NAME? error,
install and load the Analysis ToolPak add-in.

Resources