How can I scrape worded information from a website? - excel

I am new to VBA and html coding in general. I apologise if I don't understand basic terms or use them incorrectly. I was looking to create and run a macro in excel for work that would make my job a lot easier. Essentially, I need to grab a bunch of information off of a real estate website. This includes address, list price, listing agency, auction date (if any) etc. I have spent the last 4 hours reading all about web scraping and I understand the processes, I just don't know how to code it. From what I have read, I need to write a code to automatically open the website, force-ably wait until it's loaded, then retrieve information by either tag, name or id. Is this correct? How can I go about this. What resources should I look to use.
TL;DR How to web scrape text from a webpage of search results (noob instructions).

I will not tell you all the details, you have to find them on your own. Some web pages are complicated, some are easy. Other are impossible, especially if the text is displayed not in HTML but in some other form - picture, Flash etc.
It is however quite simple to extract data from HTML web pages in Excel. First of all, you want to automate it. So click 'Record macro' on the 'Developer' ribbon. This way, you will have all the reproducible step recorded and then you can have a look at the macro a adjust some steps to suit your needs. I however can't teach you here how to program VBA.
When your macro is being recorded, click on 'From web' on the 'Data' ribbon. This will show up a new web query. Then you enter the address of the web page you want to read and try to select (with the little arrow or check-off mark) as narrow area you are interested in as possible. You can also explore some of the fine-tuning options in this wizard dialog.
When you are done, click 'Import' and you will have in some form the contents of the web page. If you are lucky, the data you are interested in will be always in the same cells. Then you can read the cells and store the values somewhere (possible using another macro). If the data are not in the same cells every time you refresh the query, then you have bad luck and have to use some complicated formulas or macros to find them.
Next stop the macro which you are recording and review the code which was recorded. Try to experiment and play around with it until you discover what you actually need. Then it is up to you, how you want to automate it. The options are many...
Otherwise Excel is maybe not the best tool. If I wanted to load HTML page and extract data from it, I would use some scripting e.g. Python which has much better tools than Excel and VBA. There are also tools for converting HTML to XHTML and then extracting data from it like from well-formed XML.

Below is a very basic example illustrating some of the concepts of web scraping. Other reading you should do, would be how to use the other element selectors such as getElementByID getElementByClassName getElementByName.
Here's some code to get you started.
Public Sub ExampleWebScraper()
Dim Browser As Object: Set Browser = CreateObject("InternetExplorer.Application")
Dim Elements As Object 'Will hold all the elements in a collection
Dim Element As Object 'Our iterator that will show us the properties
'Open a page and wait for it to load
With Browser
.Visible = True
.Navigate "www.google.com"
'Wait for the page to load
While .busy Or .readystate <> 4
Application.Wait (Now() + TimeValue("00:00:01"))
Wend
'Enumerate all Elements on the page
'It will store these elements into a collection which we can
'iterate over. The * is the key for ALL, here you can specify
'any tagName and it will limit your search to just those.
'E.g. the most common is Likely Input
Set Elements = .document.getElementsByTagname("*") ' All elements
'Iterate through all elements, and print out some properties
For Each Element In Elements
On Error Resume Next ' This is needed as not all elements have the properties below
' if you try and return a property that doesn't exist for that element
' you will receive an error
'The following information will be output to the 'Immediate Window'
'If you don't see this window, Press Ctrl+G, and it will pop up. That's where this info will display
Debug.Print "The Inner Text is: " & Element.InnerText
Debug.Print "The Value is: " & Element.Value
Debug.Print "The Name is: " & Element.Name
Debug.Print "The ID is: " & Element.ID
Debug.Print "The ClassName is: " & Element.Class
Next Element
End With
'Clean up, free memory
Set Browser = Nothing
Set Elements = Nothing
Set Element = Nothing
End Sub

Related

Excel VBA / SAP GUI: how to get Value from Cell in SAP to Excel. "Node?"

I have created an Excel VBA Macro to connect to SAP GUI, pull data, etc. Nearly everything works, but I have one problem.
I am using the Transaction CV04N to download some documents from it. My code pastes all required Document names (or numbers) into the multiple selection here:
After executing it, we get this list:
Now my code just double clicks the list one-by-one and this opens up:
So, in most cases, there is only one PDF file in here, but sometimes there is also a TIFF file in there, which then produces an error, because the program tries to download it as a .pdf.
However I only want the PDF. But my program always just selects the first entry.
So I need a function/routine that reads what is in the first line, if it is not PDF, then take the next one. (there are never 2 PDFs in there, so taking the first PDF that shows up is sufficient)
If I just choose document, that only contain PDFs, then everything runs normal.
My current code looks like this (starts from the window shown in last picture)
For j = 0 To k - 1
On Error Resume Next
FileName = XYZ
SaveName = DlFolder & FileName
Session.findById("wnd[0]/usr/cntlGRID1/shellcont/shell").currentCellRow = j
Session.findById("wnd[0]/usr/cntlGRID1/shellcont/shell").doubleClickCurrentCell
Set Tree = Session.findById("wnd[0]/usr/tabsTAB_MAIN/tabpTSMAIN/ssubSCR_MAIN:SAPLCV110:0102/cntlCTL_FILES1/shellcont/shell/shellcont[1]/shell")
Tree.selectNode " 1"
Tree.nodeContextMenu " 1"
Tree.selectContextMenuItem "CF_EXP_COPY"
'It has selected the "copy to" in context menu, now just saves it to Folder saved in "Savename"
Session.findById("wnd[1]/usr/ctxtDRAW-FILEP").Text = SaveName
Session.findById("wnd[1]/tbar[0]/btn[0]").press
'goes to next doc:
Session.findById("wnd[0]/tbar[0]/btn[3]").press
Next
So, if I could get the data from the Table, I could select the one with TIFF, for that I need to read the table.
I have tried
Tree.Text
Tree.Value
Tree.Copy (and then paste in Excel)
But nothing gives me the correct value...
When I select the entry, and press CTRL-C and paste it somewhere it gives me the whole line, so a String with all columns in this entry.
If you have a solution just to get that mentioned String into a Excel Cell, that's fine with me! From there I can set up some routine to make it work.
I hope I made it understandable what I want, if not please feel free to reach out to me!

Using user input in VBA code and extracting data into workbook

I am exploring webscraping to try and improve efficiency when inputting data. Unfortunately the website I wish to extract data from is now in tabular format and so I wish to use VBA to manipulate the website to the desired result.
I'm not very familiar with coding/VBA but so far I have got VBA to open a website and search for a provided value. In this case the CAS number 67-64-1 refs Acetone on the website.
The code for this is:
Sub BrowseToSite()
Dim IE As New SHDocVw.InternetExplorer
IE.Visible = True
IE.Navigate "https://apps.who.int/food-additives-contaminants-jecfa-database/Search.aspx#"
Do While IE.ReadyState <> READYSTATE_COMPLETE
Loop
IE.Document.forms("form1").elements("ctl00$ContentPlaceHolder1$txtSearch").Value = "67-64-1"
IE.Document.forms("form1").elements("ctl00$ContentPlaceHolder1$btnSearch").Click
End Sub
Ultimately I wish to create a list in an excel sheet of CAS numbers that this code can loop through and return either the found phrase (in this case No safety concern at current levels of intake when used as a flavouring agent) or simply return a "Not Found". Sometimes the search returns multiple results, for the time being I just wish to take the first result.
This raises 2 problems I'm not sure how to solve:
How can I modify my code to loop through values within a column of a worksheet instead of having to explicitly give each one.
2.I'm unsure how to pull the data into the adjacent column.
Below is an image of the desired output. Column A is inputted by the user and hopefully column B is created by VBA code.
Any help would be appreciated.
what you need is a step by step process for web scraping.
i highly recommend you get familiar with seleniumbasic for vba https://florentbr.github.io/SeleniumBasic/
you need to loop on your excel rows using Range() or Cells(i,1) to read the row.
you check the number of search results using collections
save as many results as you wish in the excel in front of the row using cells(i, k) k being number of returned search results.
unfortunately the website did not load for me to help you further

Retrieve website table to import into Excel

I am learning how to retrieve tables from websites into excel.
I usually use the web query function, but sometimes excel doesn't see these tables.
So I am trying to write VBA queries by reading lots of posts, but I have limited knowledge and would appreciate your help.
Website link :
https://www.zacks.com/stocks/industry-rank/sectors/?icid=stocks-industry-nav_tracking-zcom-main_menu_wrapper-zacks_sector_rank
Here's what I wrote so far :
Public Sub GetInfo()
Dim IE As New InternetExplorer, clipboard As Object
With IE
.Visible = True
.navigate "https://www.zacks.com/stocks/industry-rank/sectors/?icid=stocks-industry-nav_tracking-zcom-main_menu_wrapper-zacks_sector_rank"
While .Busy Or .readyState < 4: DoEvents: Wend
Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
With .document.getElementById("industry_sector_data_table").contentDocument
ThisWorkbook.Worksheets("Sheet1").Cells(1, 1) = .getElementsByClassName("industry_sector_data_table")(0).innerText
clipboard.SetText .getElementsByTagName("table")(2).outerHTML
clipboard.PutInClipboard
End With
ThisWorkbook.Worksheets("Sheet1").Cells(2, 1).PasteSpecial
.Quit
End With
End Sub
Here's a picture of the site where the table is located :
First thing, I need to click on the Earnings tab (no idea if it's even possible)
Second item is how do I select the number of entries to 100 ? This is for other tables I need to retrieve in the future. This one actually doesn't need it.
Third, the location of the table. I have "industry_sector_date_table" as table id. Not sure how to actually get the info "td", in this example 80.77%, 83.33% and all the other ones.
The table seem to be number 2.
I also read on one post that there is a way to do it without opening the web browser, not sure if I understood that correctly. Is it possible ?
Hope anybody can help from here.
Thanks

Extract href link from source code using VBA

Below is the source code which i am getting after browsing a website
<item><a href="/search/Listing/45678489?source=results" id="mk:0:mk" class="details">
I just want to copy link /search/Listing/45678489?source=results in excel and want to know how to click it
class="details" is same for all href links that i want copy while id keep on incrementing mk:1:mk, ms:2:mk and so on
So, on each page you can gather the current set of links in a list but looking at your above example you will need to concatenate on the protocol/domain to the url before writing out to Excel. I wouldn't try clicking those written out links (hyperlinks presumably) as this is inefficient and will spawn lots of IE instances you will need to remember to manually close.
On any given page grab the list of links and generate a full url in each case
Dim nodes As Object, i As Long
Set nodes = ie.document.querySelectorAll(".details[id^='mk:']")
With ActiveSheet
For i = 0 To nodes.Length -1
.Cells(i+1,1) = "protocol + domain...." & nodes.item(i).href
Next
End With
Then later, rather than clicking, read those urls into an array, loop the array and either issue xmlhttp requests if possible, or .Navigate with IE to the current url in the array.

How to fix run time error 400 which occurs only in shared mode of excel via VBA code

I really don't know what causes error 400.
Below code runs perfectly fine in normal mode but as soon as i enable my excel in sharing mode and tries to user form, it gives me VBA 400.
What i am trying to do here is to change shape's text and disable its OnAction event, once user form is shown to user. so that another user accessing same file will come to know that someone is using "User Form" to enter data.
Dim shp As Shape
For Each shp In ActiveSheet.Shapes
If shp.TextEffect.Text = "Sort Customer" Then
shp.OnAction = ""
shp.TextEffect.Text = "Wait!!!"
End If
Next
Q. Is there any way to publish changes made by any user in shared excel automatically.
I suspect that your code falls in one of the numerous limitations of Excel shared mode, described here (see unsupported features), including
Using a data form to add new data
Using drawing tools
Inserting or changing pictures or other objects
(Please note that, due to its format, I could not easily copy that list of unsupported features in my answer.)
As far as I know, in order to keep the changes you should choose if the first one who introduces the data rules or you will choose in case of conflict. As you are looking for an "automatic" way, you should chose the first one.
You can find a good explanation described here
At Review > Share Workbook , Advanced Tab. At "Conflicting changes between users", you should chose "The changes being saved win". So as the data are introduced and saved, they are reflected.
Hope it helps.
Create a vba function in the sheet (NOT A MODULE) where users can activate the user form:
insert the following function there:
Function HyperlinkClick()
'source: https://stackoverflow.com/a/33114213/11971785
Set HyperlinkClick = Range("B2")
If HyperlinkClick.Value = "Sort Customer" Then
'sets info on WAIT
HyperlinkClick.Value = "WAIT!!!"
'shows userform
UserForm1.Show
Else
'sets info back to normal value
HyperlinkClick.Value = "Sort Customer"
End If
End Function
In the user form you can add an userform_terminate Event, which automatically changes the value in B2 back (I guess you could also do that for an workbook Close Event be on the safe side).
Private Sub userform_terminate()
'Code goes here
Range("B2").Value = "Sort Customer"
End Sub
In Excel now create a "Frontend" such as this:
and add the formula:
=HYPERLINK("#HyperlinkClick()";"Click")
to the cell where a user needs to click to open the UserForm (in this case to D2).
If you now share the workbook and click on "Click" in D2 an Event is triggered and the VBA Function "HyperlinkClick()" is called. In this function you can essentially do anything now.
Explaination:
Instead of using a graphic, button etc. which will not work correctly in shared mode, we can simply use links (which work) to trigger an Event.
Instead of "creating" and "deleting" Hyperlinks (which also does not work in shared mode) we simply build dynamic links which Point to userform.show or to nothing, depending of the situation.
Error 400 Problem: Should be solved by skipping the modify object part of the code.
Multiple User Problem: Should be solved, since only one user can activate the userform.
Is there any way to publish changes made by any user in shared excel automatically.: I guess so, please provide more information on what exactly you want to achive (incl. example).
Tip:
In General you might want to check out MS Access since it has as default feature multi-user Access and users there can use the same form at the same time, since the users only get exclusive Access for specific datapoints not the whole table/workbook or file.

Resources