Web scraping DEEPL.com using VBA Excel and Selenium - excel

i'm trying to code a function to translate sentences in Excel using DEEPL.com
My approach is using Selenium to scrape the web using Chrome (as IExplore is not supported by the web).
Public Function deepL(txt As String, inputLang As String, outputLang As String)
Dim url As String
Dim driver As New WebDriver
url = "https://www.deepl.com/translator#" & inputLang & "/" & outputLang & "/" & txt
driver.Start "Chrome"
driver.Timeouts.ImplicitWait = 5000
driver.Get url
deepL = driver.FindElementById("target-dummydiv").Text
driver.Close
End Function
----
Sub translating()
'test for word "probando" from "es" to "en"
'url: https://www.deepl.com/translator#es/en/probando
'it should return: "testing"
MsgBox (deepL("probando", "es", "en"))
End Sub
The problem comes when loading the web, so the div containing the translation is empty on load, and the GET instruction returns an empty text.
But after 1 second, the page refreshes with the correct result:
<div id="target-dummydiv" aria-hidden="true" class="lmt__textarea lmt__textarea_dummydiv" lang="en-US">testing</div>
I tried adding an implicit wait of 5 seconds in order to give time to the webpage to load, but the result is the same.
What am I doing wrong?
EDIT: I found that the div with the translation has visibility: hidden. If I show the visibility, the results are correct, but don't know how to get that in my code

OK, I found a solution:
just select the textarea where the translation is located and get the translation with .attribute("value") instead of .text
deepL = driver.FindElementByCss("textarea.lmt__textarea.lmt__target_textarea.lmt__textarea_base_style").Attribute("value")

Related

Trying to fill in data and click on button on Chrome browser after opening URL

I'm a using VBA Excel to try and open a URL via Chrome to fill in data and hit a submit button.
I'm able to write the command to open the URL but I am not sure how to continue to key in the data and submit the form
My Code starts as
Sub OpenHyperlinkInChrome()
Dim chromeFileLocation As String
Dim hyperlink As String
hyperlink = "<URL OVER HERE>"
chromeFileLocation = """C:\Program Files\Google\Chrome\Application\chrome.exe"""
Shell (chromeFileLocation & "-url " & hyperlink)
driver.FindElementByName("trackingId").Value = "123"
driver.FindElementByID("epm-submit-button-announce").Click
End Sub
I get a syntax error on the "driver.FindElementByName."
My field HTML code reads as
<input type="text" placeholder="Example: BLUE_SHOES_08. This information is for your tracking purposes." name="trackingId" class="a-input-text a-span12">
The button HTML code reads as
<span id="epm-submit-button-announce" class="a-button-text a-text-center" aria-hidden="true">
Submit
</span>
How can I go about filling the form and submitting?
The problem is that you are calling the "driver" object without defining it or even declaring it.
From the code it looks like you are trying to use Selenium, have you installed Selenium?
After you install Selenium your code should look like this:
Set driver = CreateObject("Selenium.ChromeDriver")
driver.start "chrome"
driver.Get "<URL OVER HERE>"
driver.FindElementByName("trackingId").SendKeys ("123")
driver.FindElementByID("epm-submit-button-announce").Click

readonly cells ... .. ......

Code is required
which type of value format is needed to enter in Excel cell, with one example
these are the coding stuff, also provided you an image where any can visualize how the problem look like. And in this problem we can't just use .SendKeys here it is more typical, because it have the Date-Month-Time, so help me out in this.
I tried, after removing "readonly" word in HTML .. then its working fine, but this is not the way can you edit in this code,
Sub google_search()
Dim row As Integer
row = 2
Dim bot As WebDriver
Set bot = New WebDriver
Dim GenderDD As Selenium.WebElement
bot.Start "chrome"
bot.Get "https://abcd.com/"
bot.FindElementbyName("sample_cdate").SendKeys "Value"
Stop
End Function
Also giving Inspect of Targeted Site, for the reference
<input type="text" class="form-control datetimepicker" name="sample_cdate" id="sample_cdate" placeholder="Date and Time of Sample Collection" **readonly**="">
I tried, after removing "readonly" word in HTML .. then its working
fine, but this is not the way can you edit in this code
You should replace .SendKeys() method:
'bot.FindElementbyName("patient_id").SendKeys Sheet1.Cells(row, 3).Value
bot.ExecuteScript "arguments[0].setAttribute('value', arguments[1])", _
Array(bot.FindElementById("sample_cdate"), _
Format(Sheet1.Cells(row, 16).Value, "yyyy-mm-ddThh:mm:ss"))
As a readonly element, similar as on graphic WebBrowser, you cannot type input using .SendKeys(), but you can use JavaScript to set .Value attribute through programming.
As you show, your input id may be id="sample_rdate", not sample_cdate.

Click Java Button in URL with Excel VBA

Trying to achive downloading table from company website. I can download first page. However, cannot jump to second page.
HTML CODE for Page Number
1
HTML CODE
[![HTML CODE FOR TABLE][1]][1]
page numbers are inside table and increasing one by one. at the first time when page one is active link href is not visible and shows as
<span>1</span>
I use below code to click page however I cannot succeded.
Set doc = ie.document
i = 0
For Each link In doc.Links
'doing downloading stuff here
i = i + 1
link.innerText = "javascript:__doPostBack('ctl00$View$gv','Page$" & i
link.Click
Next
When I check the page also there is a javascript function.
Javasript CODE
//<![CDATA[
var theForm = document.forms['aspnetForm'];
if (!theForm) {
theForm = document.aspnetForm;
}
function __doPostBack(eventTarget, eventArgument) {
if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
theForm.__EVENTTARGET.value = eventTarget;
theForm.__EVENTARGUMENT.value = eventArgument;
theForm.submit();
}
}
//]]>
after first page downloaded, macro click irrelevant page links even never click same page for each time.
Extra Question
also is there any way to get href values instead of innertext on below code
User Name
Thanks
Open any page by parameter of the url:
Look if you can open any page directly by a parameter of the url for the page number like this:
https://yourUrl.com?page=2
Then the walk through all pages is very easy. The only thing you must check at first is the number of the pages or a html code that only is in the page code when you try to open a page that is not available.
How to get href
You can't click innertext. That is only a string. You ask for a way to get the href and that is the right thought. If you want get the href of the first a-tag you can use this:
'Part of your code to open the page
'...
Dim nodeFirstLink as Object
Set nodeFirstLink = doc.getElementsByTagName("a")(0)
Debug.Print nodeFirstLink.href
'More of your code
'...
Here is an example how to change the href
But I don't know if this works also with JS links:
Sub ChangeHref()
Dim htmlDoc As Object
Dim nodeFirstLink As Object
'Set a short HTML Document for this example
Set htmlDoc = CreateObject("HtmlFile")
htmlDoc.body.innerHTML = "<a href='https://amazon.com'>Amazon</a>"
Set nodeFirstLink = htmlDoc.getElementsByTagName("a")(0) 'Get the first Link
Debug.Print nodeFirstLink.outerhtml 'The HTML of the first link in the html document
Debug.Print nodeFirstLink.href 'Only the href of the first link in the html document
nodeFirstLink.href = "https://ebay.com" 'Changing the href in the first link
Debug.Print nodeFirstLink.outerhtml 'The innertext is still Amazon
Debug.Print nodeFirstLink.href 'The href is the new one
End Sub

not able to switch between tabs in chrome using selenium VBA

Hi I'm finding difficulty in switching between tabs in chrome selenium VBA coding.
I have this website : http://dgftebrc.nic.in:8100/BRCQueryTrade/index.jsp
Where-in i need to input IEC Code : 0906008051
Shipping Bill no :
3929815
3953913
3979509
And then enter the Captcha(this i can do by giving the user 10 seconds of time)
After all this i need to click on "Show Details"(by pressing ctrl) so that it opens in next tab and then copy a specific data from that tab and then close it.
Then a new Shipping bill no is to be taken from the excel sheet and then the process repeats.
I could manage this much of a code :
Option Explicit
Public Sub multipletabtest()
Dim bot As WebDriver
Dim keys As New Selenium.keys
Dim count As Long
Set bot = New WebDriver
bot.Start "Chrome"
'count = 1
'While (Len(Range("A" & count)) > 0)
bot.Get "http://dgftebrc.nic.in:8100/BRCQueryTrade/index.jsp"
bot.FindElementByXPath("//input[#type='text'][#name='iec']").SendKeys "0906008051"
bot.FindElementByXPath("//input[#type='text'][#name='sno']").SendKeys "3929815"
bot.Wait 10000 'Time to enter the captcha
bot.FindElementByCss("[value='Show Details']").SendKeys keys.Control, keys.Enter 'Take the value from final result sheet
bot.SwitchToNextWindow
ThisWorkbook.Sheets("Sheet1").Range("B1") = bot.FindElementByXPath("//text()[.='Used']/ancestor::td[1]").Text
'Range("B" & count) = bot.FindElementByXPath("//text()[.='Used']/ancestor::td[1]").Text 'To extract the data
'bot.Window.Close
bot.SwitchToPreviousWindow
bot.FindElementByXPath("//input[#type='text'][#name='sno']").Clear
bot.FindElementByXPath("//input[#type='text'][#name='sno']").SendKeys "3953913"
bot.FindElementByCss("[value='Show Details']").SendKeys keys.Control, keys.Enter
bot.SwitchToNextWindow
ThisWorkbook.Sheets("Sheet1").Range("B2") = bot.FindElementByXPath("//text()[.='Used']/ancestor::td[1]").Text
'Range("B" & count) = bot.FindElementByXPath("//text()[.='Used']/ancestor::td[1]").Text
'count = count + 1
'Wend
bot.Quit
End Sub
Please look into this and help me out.
Thanks .
XMLHTTP request:
I would side step this and avoid as well overhead of using a browser.
Make an initial GET xhr request to http://dgftebrc.nic.in:8100/BRCQueryTrade/brcIssuedTrade.jsp and extract the JSESSION cookie (you can probably use .getResponseHeader("Set-Cookie") ) then make a subsequent POST xhr request to same url but provide the cookie in the request-headers and in the body ensure you pass the relevant param values.
The param values required are:
data = {
'iec': '0906008051',
'sno': '3929815',
'billid': '',
'brcstat': 'A',
'captext': 'a7m3p',
'B1': 'Show Details'
}
In VBA, the POST body for the .send body would look like:
"iec=0906008051&sno=3929815&billid&brcstat=A&captext=a7m3p&B1=Show Details"
Where iec and sno are dynamic and you would concatenate into the body of each request, perhaps in a loop.
"iec=" & iec & "&sno=" & sno & "&billid&brcstat=A&captext=" & capText & "=Show Details"
If the captcha changes then you can prompt the user to pass in the value for captext param and pass that in the body as well.
Don't think any additional headers are required though you might add an user-agent
e.g
.setRequestHeader "User-Agent" , "Mozilla/5.0"
Learn about xmlhttp (XHR) requests here or Google (enter the following in the search bar vba jsession cookie and hit enter).
The response from the POST request will contain the html within which is your desired table(s).
Selenium:
If you wish to continue with Selenium, and assuming you have enabled the Show Details button with your prior actions, you can use the following attribute = value selector:
bot.FindElementByCss("[value='Show Details']").click

How to find a table using selenium and vba on webpage that uses iframes?

The below code worked up until a few days ago to go to the url, find the table and import the contents of the table into Excel. I then did some other formatting to get the table into the appropriate rows and columns. But now this code cannot locate the table. I do not fully understand the "Set a = .FindElementsByTag("iframe")(2)" and the ".SwitchToFrame 1". But my general understanding is that this portion of the code switches to a different frame which then extracts the internal url, which then is used to get the data form the table.
I need help identifying what to change in order to get the intended "url2", which is "https://docs.google.com/spreadsheets/d/e/2PACX-1vT__QigQ9cJV03ohUkeK5dgQjfAbJqxrc68bXh9Is1WFST8wjxMxDy7hYUCFHynqRvInsANUI22GdIM/pubhtml?gid=817544912&single=true&chrome=false&widget=false&headers=false" url. *note: I do not use this docs.google url because I do not know if this url will change periodically. I know the rosterresource.com/mlb-roster-grid url will stay consistent.
I have tried changing some of the integers for "Set a = .FindElementsByTag("iframe")(2)" and the ".SwitchToFrame 1", but I am doing that blindly since I am not familiar with this art of the code.
Sub GetRRgrid()
'"Selenium type library" is a reference used
Dim d As WebDriver, a As Object
Set d = New ChromeDriver
Const url = "https://www.rosterresource.com/mlb-roster-grid/"
With d
.Start "Chrome"
.Get url
Set a = .FindElementsByTag("iframe")(2)
.SwitchToFrame 1
url2 = .FindElementByCss("iframe").Attribute("src")
.Get url2
ele = .FindElementByTag("tbody").Attribute("innerText")
d.Close
End With
' other processes t format the data after it is imported
end sub
````
Getting the iframe and switching to it:
You need to pass the iframe element (identifier argument) to SwitchToFrame, you are then within that document and can interact with its contents. No need to .get on that with Selenium. You have to switch to .SwitchToDefaultContent to go back to parent document.
You can identify the iframe in question in a number of ways. Modern browsers are optimized for css selectors so I usually go with those. The css equivalent of
.FindElementByTag("iframe")
is
.FindElementByCss("iframe")
Your iframe is the first (and only) so I wouldn't bother gathering a set of webElements and indexing into it. Also, you want to try for a short selector of a single element where possible to be more efficient.
VBA:
Option Explicit
Public Sub Example()
Dim d As WebDriver
Const URL As String = "https://www.rosterresource.com/mlb-roster-grid/"
Set d = New ChromeDriver
With d
.Start "Chrome"
.get URL
.SwitchToFrame .FindElementByCss("iframe")
Stop
.Quit
End With
End Sub
Writing to Excel (.AsTable.ToExcel) :
Something I only just discovered, haven't seen documented anywhere, and am excited by, is that there is a method to write the table direct to Excel:
Option Explicit
Public Sub Example()
Dim d As WebDriver
Const URL As String = "https://www.rosterresource.com/mlb-roster-grid/"
Set d = New ChromeDriver
With d
.Start "Chrome"
.get URL
.SwitchToFrame .FindElementByTag("iframe")
.FindElementByCss(".waffle").AsTable.ToExcel ThisWorkbook.Worksheets("Sheet1").Range("A1")
Stop
.Quit
End With
End Sub
Here is what I ended up doing for this question. Thanks to QHarr for the guidance.
Public Sub GetRRrostergrid()
Dim d As WebDriver
Const URL As String = "https://www.rosterresource.com/mlb-roster-grid/"
Dim URL2 As String
Set d = New ChromeDriver
Sheet20.Activate
With d
.Start "Chrome"
.Get URL
URL2 = .FindElementByClass("post_content").FindElementByTag("iframe").Attribute("src")
.Get URL2
.FindElementByCss(".waffle").AsTable.ToExcel ThisWorkbook.Worksheets("RRchart").Range("b1")
.Quit
End With
End Sub

Resources