Table from HTTP responseText VBA Excel - excel

I am working with VBA and trying to create a table from a response test using HTTP request. Here is my code:
Set hReq = CreateObject("MSXML2.XMLHTTP")
With hReq
.Open "GET", "https://example.url.com/data", False
.send
End With
If one navigated to the URL, the only item on the page is a CSV response that looks like this:
name,ID,job,sector johndoe,1234,creator,sector1 janedoe,5678,worker,sector2
This translates to a table with 4 columns named "name", "ID", "job", and "sector". I am pretty new to VBA and I am struggling to understand how to translate the response text into a table. But I need to get this into tabular form so I can work with the column variables. I can get the response text into a single cell:
Sheets("Sheet1").Range("A1").Value = hReq.responseText
However, I can't get the table into tabular format so I can begin working with it as I would a table. It would be great to get the data into an array in memory so that I could manipulate and analyze it using VBA, but for troubleshooting purposes, it would also be helpful to get it into an Excel Worksheet, so I can double-check my programming.

This loops through your header request and posts to your preferred sheet:
Sub test()
Dim RespArray() As String
Dim RespArray2() As String
Dim i, i2 As Long
Set hReq = CreateObject("MSXML2.XMLHTTP")
With hReq
.Open "GET", "https://example.url.com/data", False
.send
End With
' split the data into larger pieces
RespArray() = Split(hReq.responseText, " ")
' loop through first array to break it down line by line
For i = LBound(RespArray) To UBound(RespArray)
' split each line into individual pieces
RespArray2() = Split(RespArray(i), ",")
' loop through second array and add to Sheet1
For i2 = LBound(RespArray2) To UBound(RespArray2)
Worksheets("Sheet1").Cells(i + 1, i2 + 1).Value = RespArray2(i2)
Next i2
Next i
End Sub
Results in

Related

Scrape economic data from investing website

I am working on a code to get data from : https://www.investing.com/economic-calendar/core-durable-goods-orders-59
I have got the code for getting this via httprequest: but looking to change this to work for the economic data (link above) is there any way I can get the same for the economic indicators??
code below:
Option Explicit
Sub Export_Table()
'Html Objects---------------------------------------'
Dim htmlDoc As MSHTML.HTMLDocument
Dim htmlBody As MSHTML.htmlBody
Dim ieTable As MSHTML.HTMLTable
Dim Element As MSHTML.HTMLElementCollection
'Workbooks, Worksheets, Ranges, LastRow, Incrementers ----------------'
Dim wb As Workbook
Dim Table As Worksheet
Dim i As Long
Set wb = ThisWorkbook
Set Table = wb.Worksheets("Sheet1")
'-------------------------------------------'
Dim xmlHttpRequest As New MSXML2.XMLHTTP60 '
'-------------------------------------------'
i = 2
'Web Request --------------------------------------------------------------------------'
With xmlHttpRequest
.Open "POST", "https://www.investing.com/instruments/HistoricalDataAjax", False
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
.setRequestHeader "X-Requested-With", "XMLHttpRequest"
.send "curr_id=951681&smlID=1695217&header=CLNX+Historical+Data&st_date=01%2F01%2F2017&end_date=03%2F01%2F2019&interval_sec=Monthly&sort_col=date&sort_ord=DESC&action=historical_data"
If .Status = 200 Then
Set htmlDoc = CreateHTMLDoc
Set htmlBody = htmlDoc.body
htmlBody.innerHTML = xmlHttpRequest.responseText
Set ieTable = htmlDoc.getElementById("curr_table")
For Each Element In ieTable.getElementsByTagName("tr")
Table.Cells(i, 1) = Element.Children(0).innerText
Table.Cells(i, 2) = Element.Children(1).innerText
Table.Cells(i, 3) = Element.Children(2).innerText
Table.Cells(i, 4) = Element.Children(3).innerText
Table.Cells(i, 5) = Element.Children(4).innerText
Table.Cells(i, 6) = Element.Children(5).innerText
Table.Cells(i, 7) = Element.Children(6).innerText
i = i + 1
DoEvents: Next Element
End If
End With
Set xmlHttpRequest = Nothing
Set htmlDoc = Nothing
Set htmlBody = Nothing
Set ieTable = Nothing
Set Element = Nothing
End Sub
Public Function CreateHTMLDoc() As MSHTML.HTMLDocument
Set CreateHTMLDoc = CreateObject("htmlfile")
End Function
I have used the excel tool Power Query for this very thing. It is also called Get & Transform Data. I don't like using vba for doing this type of thing.
To make it work:
In Excel Go to Data>Get Data>From Other Sources>From Web.
Enter the URL
Wait for the webpage to load and then pick the table that you want.
This website took awhile to load, but it did work for me.
Choose "Load" which goes directly to the sheet, or "Transform Data" to manipulate the data in Power Query. There are many options in power query such as split columns, filter data, Calculate Columns and ...
I would avoid the overhead of setting up a permanent connection and simply continue using XHR. With the data > from web, you cannot grab more rows than are present on the initial landing. If however you go with XHR, you can issue POST requests to get more data. The code below utilizes a loop to retrieve additional results beyond the immediate visible on the page.
When you press the Show more link there is a POST request for an additional 6 rows which uses the latest date from the current set of results as part of the POST body. The response returned is JSON. Rather than bringing in a JSON parser, given the standard nature of the JSON, and that I am already using regex to clean the date format in column 1 to put in the POST body, I use two simple regexes to extract the html for the next results table from the response, and to check whether there are more results.
The format of the JSON is:
{
"historyRows": "<tr>…..</tr>",
"hasMoreHistory": "1"
}
Or
{
"historyRows": "<tr>…..</tr>",
"hasMoreHistory": false
}
So, I do some cleaning of the extracted html in order to not confuse the html parser within MSHTML. Furthermore, I add in an id to identify the table I have constructed, so I can continue to use an id css selector (#) list within my UpdateDateResults function.
I initially oversize an array to store each retrieved table which I update ByRef. I loop requesting more results until either there are no more results, there is an error parsing the maximum date from the last retrieved table column 1, or until my specified earliest date for data retrieval falls within the date range of the latest returned table.
Finally, I write the results array out to the sheet in one go.
N.B. You can target the table by its id. It looks like the number at the end of the id could be the same as for the goods url, lending itself to generalizing the code below to work for other goods.
VBA:
Option Explicit
Public Sub GetInvestingInfo()
'tools > references > Microsoft HTML Object Library
Dim html As MSHTML.HTMLDocument, xhr As Object
Const STARTDATE As Date = "2019-11-25" 'Adjust as required. DateAdd("yyyy", -2, Date) 2 years back. This means may have some earlier months in _
batch that spans the start date but won't issue an additional request after this
Set xhr = CreateObject("MSXML2.XMLHTTP")
Set html = New MSHTML.HTMLDocument
With xhr
.Open "GET", "https://www.investing.com/economic-calendar/core-durable-goods-orders-59", False
.setRequestHeader "User-Agent", "Safari/537.36"
.send
html.body.innerHTML = .responseText
End With
Dim firstTable As Boolean, r As Long, results() As Variant
ReDim results(1 To 100000, 1 To 5)
'process initial table and update results, get cleaned date needed for request for more results
firstTable = True
Dim latestDate As String
UpdateDateResults latestDate, results, firstTable, r, html
Dim re As Object, maxDate As String, hasMoreHistory As Boolean, s As String
Set re = CreateObject("VBScript.RegExp")
With re
.Global = True
.MultiLine = False
End With
maxDate = cleanedDate(latestDate, re)
hasMoreHistory = True
Dim errorDate As Date
errorDate = DateAdd("d", 1, Date)
Do While maxDate >= STARTDATE And maxDate < errorDate 'break loop using pre-defined earliest date, error with date conversion, or when no more rows found
Application.Wait (Now + TimeSerial(0, 0, 1)) 'Pause
s = GetMoreRows(xhr, Format$(maxDate, "YYYY-MM-DD")) 'max a POST request for more data
re.Pattern = "hasMoreHistory"":(""?.*?""?)}" 'Check if there are more rows still available. "1" for yes, false for no
hasMoreHistory = (re.Execute(s)(0).submatches(0) <> False)
If Not hasMoreHistory Then Exit Do
re.Pattern = "historyRows"":""(.*)"","
html.body.innerHTML = "<table id=""me"">" & Replace$(re.Execute(s)(0).submatches(0), "\/", "/") & "</table>" 'fix html and feed into html variable
UpdateDateResults latestDate, results, firstTable, r, html
maxDate = cleanedDate(latestDate, re) 'convert value retrieved from last row in date column of table to an actual date
Loop
With ActiveSheet
.Cells.ClearContents
.Cells(1, 1).Resize(r, 5) = results 'Don't bother to resize results as clear all cells before write ou
End With
End Sub
Public Sub UpdateDateResults(ByRef latestDate As String, ByRef results() As Variant, ByRef firstTable As Boolean, ByRef r As Long, ByVal html As MSHTML.HTMLDocument)
Dim table As MSHTML.HTMLTable 'return latest date from function
Set table = html.querySelector("#eventHistoryTable59, #me")
latestDate = table.Rows(table.Rows.Length - 1).Children(0).innerText
Dim i As Long, n As Long, j As Long
n = IIf(firstTable, 0, 1)
For i = n To table.Rows.Length - 1
r = r + 1
For j = 0 To table.Rows(i).Children.Length - 2
results(r, j + 1) = table.Rows(i).Children(j).innerText
Next
Next
firstTable = False
End Sub
Public Function cleanedDate(ByVal dirtyString As String, ByVal re As Object) As Date
re.Pattern = "(^[A-Z][a-z]{2}).*(\d{2}),.(\d{4})(.*)"
On Error GoTo errhand:
If re.test(dirtyString) Then
cleanedDate = CDate(re.Replace(dirtyString, "$2" & Chr$(32) & "$1" & Chr$(32) & "$3"))
Exit Function
End If
errhand:
cleanedDate = DateAdd("d", 1, Date)
End Function
Public Function GetMoreRows(ByVal xhr As Object, ByVal dateStamp As String) As String
With xhr
.Open "POST", "https://www.investing.com/economic-calendar/more-history", False
.setRequestHeader "User-Agent", "Safari/537.36"
.setRequestHeader "x-requested-with", "XMLHttpRequest"
.setRequestHeader "content-type", "application/x-www-form-urlencoded"
.send "eventID=430865&event_attr_ID=59&event_timestamp=" & dateStamp & "+" & Application.WorksheetFunction.EncodeURL("12:30:00") & "&is_speech=0"
GetMoreRows = .responseText
End With
End Function
Regexes (without the double " escaping for VBA):
hasMoreHistory":("?.*?"?)}
historyRows":"(.*)",

Converting weird characters and symbols into normal language in excel

I am using the VBA code to extract information from a website into excel cells, and the numerical information is fine but I have a problem with text strings. I am mostly extracting information from Georgian websites, and the texts with the Georgian language are not properly displayed in excel, so I was wondering if there is any chance (code or something else) I could convert these symbols into proper language.
Sub GetData()
Dim request As Object
Dim response As String
Dim html As New HTMLDocument
Dim website As String
Dim price As Variant
Dim address As Variant
Dim x As Integer
Dim y As Range
x = 1
Do Until x = 9
Set y = Worksheets(1).Range("A21:A200"). _
Find(x, LookIn:=xlValues, lookat:=xlWhole)
website = "https://www.myhome.ge/ka/pr/11247371/iyideba-Zveli-ashenebuli-bina-veraze-T.-WoveliZis-qucha"
' Create the object that will make the webpage request.
Set request = CreateObject("MSXML2.XMLHTTP")
' Where to go and how to go there.
request.Open "GET", website, False
' Get fresh data.
request.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
' Send the request for the webpage.
request.send
' Get the webpage response data into a variable.
response = StrConv(request.responseBody, vbUnicode)
' Put the webpage into an html object.
html.body.innerHTML = response
' Get info from the specified element on the page.
address = html.getElementsByClassName("address").Item(0).innerText
price = html.getElementsByClassName("d-block convertable").Item(0).innerText
y.Offset(0, 1).Value = address
y.Offset(0, 5).Value = price
x = x + 1
Loop
End Sub
This is the code that I took from a youtube video (https://www.youtube.com/watch?v=IOzHacoP-u4) and slightly modified, and it works, I just have a problem with how excel displays the characters in text strings.
For your issue in the question
Remove this line response = StrConv(request.responseBody, vbUnicode) as it's not required.
Change html.body.innerHTML = response to html.body.innerHTML = request.responseText.
For your issue in comment
To retrieve the ID of the property, it can be retrieved from the class id-container, you will need to perform some string processing though to remove the extract :
propertyID = Trim$(Replace(html.getElementsByClassName("id-container")(0).innerText, ":", vbNullString))
Note: You should try to avoid declaring variable as Variant. innerText property returns a String datatype so you should declare address and price as String.

Extract web table to Excel in VBA when table is generated by script

I'm trying to extract the stats table from This page into Excel so that I can refresh automatically. I tried navigating through the Get External Data process, but where the DIV with the table should be, it shows as null instead of as a Table element. I suspect it's how the site is coded, as looking at the source code, that table is generated from a script, even though it just looks like standard table nomenclature as a table element in the actual display:
So I tried some macros for this purpose, such as the following:
Sub Extract_data()
Dim url As String, links_count As Integer
Dim i As Integer, j As Integer, row As Integer
Dim XMLHTTP As Object, html As Object
Dim tr_coll As Object, tr As Object
Dim td_coll As Object, td As Object
links_count = 39
For i = 0 To links_count
url = "http://www.admision.unmsm.edu.pe/res20130914/A/011/" & i & ".html"
Set XMLHTTP = CreateObject("MSXML2.XMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.send
Set html = CreateObject("htmlfile")
html.body.innerHTML = XMLHTTP.ResponseText
Set tbl = html.getelementsbytagname("Table")
Set tr_coll = tbl(0).getelementsbytagname("TR")
For Each tr In tr_coll
j = 1
Set td_col = tr.getelementsbytagname("TD")
For Each td In td_col
Cells(row + 1, j).Value = td.innerText
j = j + 1
Next
row = row + 1
Next
Next
End Sub
However, that only pulls the same tables I have access to already with Get External Data. It doesn't seem they can 'see' the table on the page. Is there a way to VBA code such that it will pull the actual table on the page, instead of just checking the page source code?
On a sidenote, you can see from the screenshot code above that there is a CSV export link on the page. However, because this is generated with JS, it just shows up as a huge string of characters beginning with data:application and not an actual link that can be refreshed (and I suspect it could not be anyway since the characters likely change as the table parameters do). It does have a download attribute attached for the filename, is there a way to work backwards from that attribute to get Excel to find the file?
I'll take any method I can get. Thanks!

VBA With CreateObject("msxml2.xmlhttp") - getting data from table with irregular structure

I've aged 5 years spending hours trying to solve this and spent hours and hours trying to understand it, so here goes :)
I am trying to extract some tables from
this company page on Market Screener
using the CreateObject method.
Taking table(25) as an example (this one) (screenshot, I am trying to extract the table "Type of business" and the first column listings the business types (not the 2016, 2017 and Delta columns).
I found a head-startonline in this
2016 stackoverflow thread
Dim oDom As Object: Set oDom = CreateObject("htmlFile")
Dim x As Long, y As Long
Dim oRow As Object, oCell As Object
Dim vData As Variant
Dim link As String
link = "https://www.marketscreener.com/COLUMBIA-SPORTSWEAR-COMPA-8859/company/"
y = 1: x = 1
With CreateObject("msxml2.xmlhttp")
.Open "GET", link, False
.send
oDom.body.innerHTML = .responseText
End With
With oDom.getElementsByTagName("table")(25)
ReDim vData(1 To .Rows.Length, 1 To 11) '.Rows(1).Cells.Length)
For Each oRow In .Rows
For Each oCell In oRow.Cells
vData(x, y) = oCell.innerText
y = y + 1
Next oCell
y = 1
x = x + 1
Next oRow
End With
Sheets(2).Cells(66, 2).Resize(UBound(vData), UBound(vData, 2)).Value = vData
It sort-of works, but is returning a jumbled table with all the data in it in a single cell, like this, but jumbled into a single cell
I then found another tweak online, which was this, which suggests copy and pasting and letting Excel work out how to paste it in, which sort of works too:
With oDom.getElementsByTagName("table")(25)
Dim dataObj As Object
Set dataObj = CreateObject("new:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
dataObj.SetText "<table>" & .innerHTML & "</table>"
dataObj.PutInClipboard
End With
Sheets(2).Paste Sheets(2).Cells(66, 1)
Which creates this result sort-of correctly, but not just the values - I am trying to paste special, without any formatting.
Driving me a bit nuts and get the concept but completely stuck at the moment. Is there a way to do it? I can replicate it on on tables on that page and other tabs then if I have a head-start.
Any help greatly appreciated,
Best Regards,
Paul
If you have Excel 2010+, you can do this using Power Query.
You can set up a query to get this Data from the Web.
The PQ code would be:
let
Source = Web.Page(Web.Contents("https://www.marketscreener.com/COLUMBIA-SPORTSWEAR-COMPA-8859/company/")),
myData = Source{3}[Data],
firstColumn = {List.First(Table.ColumnNames(myData))},
#"Removed Other Columns" = Table.SelectColumns(myData,firstColumn),
#"Removed Blank Rows" = Table.SelectRows(#"Removed Other Columns", each not List.IsEmpty(List.RemoveMatchingItems(Record.FieldValues(_), {"", null})))
in
#"Removed Blank Rows"
This results in:
And the query can be refreshed, edited, etc.
As written, the query will keep the first column of the desired table. You can decide which table to process by changing the number in Source{n}. 3 happens to be the one you are interested in, but there are 11 or 12 tables, if I recall correctly.
Taking your given example you can use a combination of class and type (tag) to select those elements. Same logic applies for next table as well. The problem here is you really have to inspect the html and tailor what you do. Otherwise, the easy solution, which you didn't want, is to use the clipboard.
Option Explicit
Public Sub GetTableInfo()
Dim html As HTMLDocument
Set html = New HTMLDocument '< VBE > Tools > References > Microsoft Scripting Runtime
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.marketscreener.com/COLUMBIA-SPORTSWEAR-COMPA-8859/company/", False
.send
html.body.innerHTML = .responseText
End With
Dim leftElements As Object, td As Object
'.tabElemNoBor.fvtDiv tr:nth-of-type(2) td.nfvtTitleLeft
Set leftElements = html.getElementsByClassName("tabElemNoBor fvtDiv")(0).getElementsByTagName("tr")(2)
For Each td In leftElements.getElementsByTagName("td")
If td.className = "nfvtTitleLeft" Then
Debug.Print td.innerText
End If
Next
End Sub

How to record API Ticker data in an Excel table

I have no VBA experience so I am hoping that there is a way to do this without use of macros or programming - If there isn't then help with code and explaining what it is doing, so I can learn from it, would be very much appreciated. :)
I am using a daily refreshed API ticker which gives me a date and a value.
I then have a table predefined for the year, 01/01/18 > 31/12/18 (for example), adjacent to a cell for the value.
I’ve used vlookup to populate the value on the given day, but obviously in this current state, the data is not recordable, so when the date on the API changes from 01/01/18 to 02/01/18 the value is lost and it moves onto the next specified cell to fill.
Is there a way to record/ store this data – Make it non external automatically? Without copy/paste text or value manually?
The data you are fetching from that API is JSON. Unfortunately support for JSON in VBA is 100% non-existent. There are some folks that have made some libraries, but since you are new to VBA, and the JSON response is very very small, I think it's best to just treat the response from the API as a string and get the stuff we need by parsing the string.
An example of what this would look like for that URL (appending whatever is fetched to Sheet1 columns A, B, C, and D:
Sub getTickerValue()
'Get the data from the API
Dim strResponse As String: strResponse = LoadHTML("https://api.fixer.io/latest?symbols=USD,GBP")
'Since we aren't actually going to parse the json because it's not well supported in VBA
' we will instead remove everything we don't care about and parse the results
' So replace out double quotes and squirrely braces (Not a great idea for more complex json)
strResponse = Replace(strResponse, Chr(34), "")
strResponse = Replace(strResponse, "}", "")
strResponse = Replace(strResponse, "{", "")
'Load up each item into an array splitting on comma
Dim jsonArray As Variant: jsonArray = Split(strResponse, ",")
'Loop the array, sniff for the data we want, and toss it in it's respective variable
Dim strBase As String, strDate As String, strRate1 As String, strRate2 As String
For Each elem In jsonArray
If Split(elem, ":")(0) = "base" Then strBase = Split(elem, ":")(1)
If Split(elem, ":")(0) = "date" Then strDate = Split(elem, ":")(1)
If Split(elem, ":")(0) = "rates" Then strRate1 = Split(elem, ":")(2)
If Split(elem, ":")(0) = "USD" Then strRate2 = Split(elem, ":")(1)
Next elem
'Set up the range where we will output this by starting at cell A99999
' in Sheet1 and going up until we hit the first occupied cell
' offset by 1 row to get the first unoccupied cell
Dim outRange As Range
Set outRange = Sheet1.Range("A99999").End(xlUp).Offset(1)
'Now we know the last unoccupied cell in Sheet1, go ahead and dump the data
outRange.Value = strBase
outRange.Offset(, 1).Value = strDate
outRange.Offset(, 2).Value = strRate1
outRange.Offset(, 3).Value = strRate2
End Sub
Function LoadHTML(xmlurl) As String
'Using the XMLHTTP library to get the results since monkeying with IE is ugly and painful
Dim xmlhttp
Set xmlhttp = CreateObject("MSXML2.XMLHTTP")
xmlhttp.Open "GET", xmlurl, False
' switch to manual error handling
On Error Resume Next
xmlhttp.Send
If Err.Number <> 0 Then
WScript.Echo xmlhttp.parseError.Reason
Err.Clear
End If
' switch back to automatic error handling
On Error GoTo 0
LoadHTML = xmlhttp.responseText
End Function
This isn't exactly what you are looking for, but I think it's close enough to get you in the ballpark. You can run it by creating a button or shape on the sheet and then pointing that at the "GetTickerValue" macro. Alternatively after pasting this in a new VBA Module you can stick your cursor in the GetTicketValue block of code and hitting the play button at the top (or F5). It will fetch the data and append it to whatever your Sheet1 is.

Resources