Converting weird characters and symbols into normal language in excel - excel

I am using the VBA code to extract information from a website into excel cells, and the numerical information is fine but I have a problem with text strings. I am mostly extracting information from Georgian websites, and the texts with the Georgian language are not properly displayed in excel, so I was wondering if there is any chance (code or something else) I could convert these symbols into proper language.
Sub GetData()
Dim request As Object
Dim response As String
Dim html As New HTMLDocument
Dim website As String
Dim price As Variant
Dim address As Variant
Dim x As Integer
Dim y As Range
x = 1
Do Until x = 9
Set y = Worksheets(1).Range("A21:A200"). _
Find(x, LookIn:=xlValues, lookat:=xlWhole)
website = "https://www.myhome.ge/ka/pr/11247371/iyideba-Zveli-ashenebuli-bina-veraze-T.-WoveliZis-qucha"
' Create the object that will make the webpage request.
Set request = CreateObject("MSXML2.XMLHTTP")
' Where to go and how to go there.
request.Open "GET", website, False
' Get fresh data.
request.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
' Send the request for the webpage.
request.send
' Get the webpage response data into a variable.
response = StrConv(request.responseBody, vbUnicode)
' Put the webpage into an html object.
html.body.innerHTML = response
' Get info from the specified element on the page.
address = html.getElementsByClassName("address").Item(0).innerText
price = html.getElementsByClassName("d-block convertable").Item(0).innerText
y.Offset(0, 1).Value = address
y.Offset(0, 5).Value = price
x = x + 1
Loop
End Sub
This is the code that I took from a youtube video (https://www.youtube.com/watch?v=IOzHacoP-u4) and slightly modified, and it works, I just have a problem with how excel displays the characters in text strings.

For your issue in the question
Remove this line response = StrConv(request.responseBody, vbUnicode) as it's not required.
Change html.body.innerHTML = response to html.body.innerHTML = request.responseText.
For your issue in comment
To retrieve the ID of the property, it can be retrieved from the class id-container, you will need to perform some string processing though to remove the extract :
propertyID = Trim$(Replace(html.getElementsByClassName("id-container")(0).innerText, ":", vbNullString))
Note: You should try to avoid declaring variable as Variant. innerText property returns a String datatype so you should declare address and price as String.

Related

Extract web table to Excel in VBA when table is generated by script

I'm trying to extract the stats table from This page into Excel so that I can refresh automatically. I tried navigating through the Get External Data process, but where the DIV with the table should be, it shows as null instead of as a Table element. I suspect it's how the site is coded, as looking at the source code, that table is generated from a script, even though it just looks like standard table nomenclature as a table element in the actual display:
So I tried some macros for this purpose, such as the following:
Sub Extract_data()
Dim url As String, links_count As Integer
Dim i As Integer, j As Integer, row As Integer
Dim XMLHTTP As Object, html As Object
Dim tr_coll As Object, tr As Object
Dim td_coll As Object, td As Object
links_count = 39
For i = 0 To links_count
url = "http://www.admision.unmsm.edu.pe/res20130914/A/011/" & i & ".html"
Set XMLHTTP = CreateObject("MSXML2.XMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.send
Set html = CreateObject("htmlfile")
html.body.innerHTML = XMLHTTP.ResponseText
Set tbl = html.getelementsbytagname("Table")
Set tr_coll = tbl(0).getelementsbytagname("TR")
For Each tr In tr_coll
j = 1
Set td_col = tr.getelementsbytagname("TD")
For Each td In td_col
Cells(row + 1, j).Value = td.innerText
j = j + 1
Next
row = row + 1
Next
Next
End Sub
However, that only pulls the same tables I have access to already with Get External Data. It doesn't seem they can 'see' the table on the page. Is there a way to VBA code such that it will pull the actual table on the page, instead of just checking the page source code?
On a sidenote, you can see from the screenshot code above that there is a CSV export link on the page. However, because this is generated with JS, it just shows up as a huge string of characters beginning with data:application and not an actual link that can be refreshed (and I suspect it could not be anyway since the characters likely change as the table parameters do). It does have a download attribute attached for the filename, is there a way to work backwards from that attribute to get Excel to find the file?
I'll take any method I can get. Thanks!

Web Scraping - StockCharts - getElementsByTagName ("a")

I am trying to get the inner text and href attribute of the column Name at this website:
https://stockcharts.com/freecharts/sectorsummary.html?&G=SECTOR_DJUSNS&O=1
but I get all hyperlinks except the ones inside the table.
Can somebody please take a look at this code and let me know what is wrong?
Sub Scraping_StockCharts()
Dim XMLPage As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim HTMLIm As MSHTML.IHTMLElement
Dim HTMLIms As MSHTML.IHTMLElementCollection
Dim URL As String
URL = "https://stockcharts.com/freecharts/sectorsummary.html?&G=SECTOR_DJUSNS&O=1"
XMLPage.Open "Get", URL, False
XMLPage.setRequestHeader "Content-Type", "text/xml"
XMLPage.send
HTMLDoc.body.innerHTML = XMLPage.responseText
Row = 1
Set HTMLIms = HTMLDoc.getElementsByTagName("a")
For Each HTMLIm In HTMLIms
Sheets("Results").Cells(Row, 2).Value = HTMLIm.innerText
Sheets("Results").Cells(Row, 3).Value = HTMLIm.getAttribute("href")
Row = Row + 1
Next HTMLIm
End Sub
Scraping is not allowed via xmlhttp. Not sure about automating a browser. You will need to read terms of service carefully. With browser automation you could just use the url you have I suspect.
From an intellectual point of view the data can be got from https://c.stockcharts.com/j-sum/sum?cmd=perf&group=SECTOR_DJUSNS which will return JSON. From that json you can reconstruct the url by accessing the sym value for each dictionary in the list of dictionaries returned. Concatenate that sym value onto the end of base string of https://stockcharts.com/h-sc/ui?s=
e.g. for first dictionary in list
https://stockcharts.com/h-sc/ui?s= + sym
gives
https://stockcharts.com/h-sc/ui?s=TKAT
Basically, the server expects a query string and returns json. The page uses this to update content. This can be viewed in network tab of browser when refreshing page.
You might be better off looking for a free API that serves similar data.

Table from HTTP responseText VBA Excel

I am working with VBA and trying to create a table from a response test using HTTP request. Here is my code:
Set hReq = CreateObject("MSXML2.XMLHTTP")
With hReq
.Open "GET", "https://example.url.com/data", False
.send
End With
If one navigated to the URL, the only item on the page is a CSV response that looks like this:
name,ID,job,sector johndoe,1234,creator,sector1 janedoe,5678,worker,sector2
This translates to a table with 4 columns named "name", "ID", "job", and "sector". I am pretty new to VBA and I am struggling to understand how to translate the response text into a table. But I need to get this into tabular form so I can work with the column variables. I can get the response text into a single cell:
Sheets("Sheet1").Range("A1").Value = hReq.responseText
However, I can't get the table into tabular format so I can begin working with it as I would a table. It would be great to get the data into an array in memory so that I could manipulate and analyze it using VBA, but for troubleshooting purposes, it would also be helpful to get it into an Excel Worksheet, so I can double-check my programming.
This loops through your header request and posts to your preferred sheet:
Sub test()
Dim RespArray() As String
Dim RespArray2() As String
Dim i, i2 As Long
Set hReq = CreateObject("MSXML2.XMLHTTP")
With hReq
.Open "GET", "https://example.url.com/data", False
.send
End With
' split the data into larger pieces
RespArray() = Split(hReq.responseText, " ")
' loop through first array to break it down line by line
For i = LBound(RespArray) To UBound(RespArray)
' split each line into individual pieces
RespArray2() = Split(RespArray(i), ",")
' loop through second array and add to Sheet1
For i2 = LBound(RespArray2) To UBound(RespArray2)
Worksheets("Sheet1").Cells(i + 1, i2 + 1).Value = RespArray2(i2)
Next i2
Next i
End Sub
Results in

Image url extraction using excel vba

Iam working with excel 2016. I need to extract the link of an image from a website using VBA in excel.
Example, i have a website that shows a product with the link : https://www.hikvision.com/en/products/Turbo-HD-Products/Turbo-HD-Cameras/Value-Series/ds-2ce56d0t-vpir3f/
My image is into a div , like that :
<div class="slide-image" style="background-image: url('/content/dam/hikvision/products/HIKVISION/Turbo_HD_Products/Turbo_HD_Cameras/Value_Series/D0T_Series/DS-2CE56D0T-VPIR3F/images/2CE56D0T-半球11-正视图.png.thumb.1280.1280.png');"></div>
I tried this :
Private Sub btnExtractURL_Click()
Dim sourceString As String
Dim rowIdx As Integer, rowMax As Integer
Dim posFirst As Integer, posLast As Integer, chrLength As Integer
rowMax = Range("A3").End(xlDown).Row
' ---
For rowIdx = 3 To rowMax
If Cells(rowIdx, 1).Value <> "" Then
Cells(rowIdx, 2).Value = ""
sourceString = Cells(rowIdx, 1).Value
posFirst = InStr(sourceString, "http")
posLast = InStr(posFirst, sourceString, """")
chrLength = (posLast - 1) - (posFirst - 1)
Cells(rowIdx, 2).Value = Mid(sourceString, posFirst, chrLength)
End If
Next
' ---
MsgBox "finished"
End Sub
But i have an error with this solution... I tried to extract the text to see another method, and it work's but when i insert the class of that image, it doesn't work !
Sub Get_Web_Data()
Dim request As Object
Dim response As String
Dim html As New HTMLDocument
Dim website As String
Dim price As Variant
' Website to go to.
website = "https://www.hikvision.com/en/products/Turbo-HD-Products/Turbo-HD-Cameras/Value-Series/ds-2ce56d0t-vpir3f/"
' Create the object that will make the webpage request.
Set request = CreateObject("MSXML2.XMLHTTP")
' Where to go and how to go there - probably don't need to change this.
request.Open "GET", website, False
' Get fresh data.
request.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
' Send the request for the webpage.
request.send
' Get the webpage response data into a variable.
response = StrConv(request.responseBody, vbUnicode)
' Put the webpage into an html object to make data references easier.
html.body.innerHTML = response
' Get the price from the specified element on the page.
Name = html.getElementsByClassName("prod_name").Item(0).innerText
' Output the price into a message box.
MsgBox Name
End Sub
Can you give an idea to extract this image and copy the link into my excel ?

how to get inner text of html under id?

I am trying to pull data pull inner text under id in excel cell.
This is for XML code.
Sub getelementbyid()
Dim XMLpage As New MSXML2.XMLHTTP60
Dim hdoc As New MSHTML.HTMLDocument
Dim HBEs As MSHTML.IHTMLElementCollection
Dim HBE As MSHTML.IHTMLElement
Dim ha As String
XMLpage.Open "GET","https://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuote.jsp?symbol=HAL", False
XMLpage.send
hdoc.body.innerHTML = XMLpage.responseText
ha = hdoc.getelementbyid("open").innerText
Range("K11").Value = ha
Debug.Print ha
End Sub
I expect output value, but it shows --.
Examine the response text. There is a difference in the way the page is rendered in the browser versus what is returned in the ResponseText.
I put the URL into a browser went into dev tools (F12), found the element, and noted the numeric value inside the HTML element.
Then I dumped the response text we're getting in VBA into a cell and copied the entire cell value into Notepad++. If you do that you'll see the initial value inside the #open element is indeed "--".
The real value appears to be getting written into the HTML via JavaScript, which is common practice. There is a JSON object at the top of the page, presumably injected into the document from the back-end of the website upon your request.
So you have to parse the JSON, not the HTML. I've provided code doing just that. Now, there may be a better way to do it, I feel this code is kind of "hacky" but it's getting the job done for your example URL.
Sub getelementbyid()
Dim XMLpage As New MSXML2.XMLHTTP60
Dim hdoc As New MSHTML.HTMLDocument
Dim HBEs As MSHTML.IHTMLElementCollection
Dim HBE As MSHTML.IHTMLElement
Dim ha As String
XMLpage.Open "GET", "https://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuote.jsp?symbol=HAL", False
XMLpage.send
'// sample: ,"open":"681.05",
Dim token As String
token = """open"":"""
Dim startPosition As Integer
startPosition = InStr(1, XMLpage.responseText, token)
Dim endPosition As Integer
endPosition = InStr(startPosition, XMLpage.responseText, ",")
Dim prop As String
prop = Mid(XMLpage.responseText, startPosition, endPosition - startPosition)
prop = Replace(prop, """", vbNullString)
prop = Replace(prop, "open:", vbNullString)
Dim val As Double
val = CDbl(prop)
ha = val
Range("K11").Value = ha
Debug.Print ha
End Sub
Here are two methods. 1) Using regex on the return text. Usually frowned upon but perfectly serviceable here. 2) Traditional extract json string and use json parser to parse out value.
The data you want is stored in a json string found both on the webpage and the xmlhtttp response, under the same element:
This means you can treat the html as a string and target just the pattern for the open price using regex as shown below, or parse the xmlhttp request into an html parser, grab the required element, extract its innerText and trim off the whitespace, then pass to a json parser to extract the open price.
In both methods you want to avoid being served cached results so the following header is an important addition to attempt to mitigate for this:
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
There is no need for addtional cell formatting. Full value comes out for both your tickers.
Regex:
It is present in a json string in the response. You can regex it out easily from return text.
Regex explanation:
VBA:
Option Explicit
Public Sub GetClosePrice()
Dim ws As Worksheet, re As Object, p As String, r As String
Set ws = ThisWorkbook.Worksheets("Sheet1")
p = """open"":""(.*?)"""
Set re = CreateObject("VBScript.RegExp")
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuote.jsp?symbol=HAL", False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
If .Status = 200 Then
r = GetValue(re, .responseText, p)
Else
r = "Failed connection"
End If
End With
ws.Range("K11").Value = r
End Sub
Public Function GetValue(ByVal re As Object, ByVal inputString As String, ByVal pattern As String) As String
With re
.Global = True
.pattern = pattern
If .test(inputString) Then ' returns True if the regex pattern can be matched agaist the provided string
GetValue = .Execute(inputString)(0).submatches(0)
Else
GetValue = "Not found"
End If
End With
End Function
HTML and json parser:
This requires installing code for jsonparser from jsonconverter.bas in a standard module called JsonConverter and then going VBE>Tools>References>Add a reference to Microsoft Scripting Runtime and Microsoft HTML Object Library.
VBA:
Option Explicit
Public Sub GetClosePrice()
Dim ws As Worksheet, re As Object, r As String, json As Object
Set ws = ThisWorkbook.Worksheets("Sheet1")
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuote.jsp?symbol=MRF", False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
If .Status = 200 Then
Dim html As HTMLDocument
Set html = New HTMLDocument
html.body.innerHTML = .responseText
Set json = JsonConverter.ParseJson(Trim$(html.querySelector("#responseDiv").innerText))
r = json("data")(1)("open")
Else
r = "Failed connection"
End If
End With
ws.Range("K11").Value = r
End Sub

Resources