Pulling data into Excel from multiple pages

Pulling data into Excel from multiple pages - excel

I’m currently trying to pull data from an internal website. However, a few challenges stand in my way. For clarity, I’ve attached a screenshot of the interface I’m working with. I’ve removed all of the text and added my own references for confidentiality purposes.
The data I need to pull is in Tab1 > TabD. I then apply a filter to it (not sure if that’s relevant here). In this example, there are 16 pages and it is these 16 pages of data (headers 1 to 5) that I need to pull into an excel sheet.
There’s no API for this and the page number doesn’t change in the URL so it can’t be used (to my knowledge).
With all these conditions, is this even feasible with VBA?
Thank you all for your time.

I WOULD just comment on this. But I don't have enough reputation to do so. Come on StackOverflow, I want to help people! I'll leave an answer instead, though I may need more information.
I am assuming this is done in Internet Explorer. In which case if you press Ctrl+U you can bring up the source for a page. Bring up the source to the page shown in your screenshot. You'll need to look for a javascript button that changes the page, which is the tricky part. The syntax for such a button in javascript looks like this:
<button onclick="functionToRun()">Button Text</button>
The button in the example above runs the function "functionToRun." Once you find the function that changes the page, insert it into the VBA line below, once you have IE properly initialized in your script:
Call IE.document.parentWindow.execScript("functionToRun()", "JavaScript")
The line above runs the JavaScript function in IE, effectively changing the page.

Related

Creating a new GitLab issue and assigning a label to it automatically

I am working on designing webpage UI where some experimental data is stored. This data could be inaccurate sometimes so I'm providing a button at the bottom of the page which redirects the user to the new issue webpage in GitLab. Look at the sample below.
And this button will take the user to somewhere like this:
The URL behind the button is simply
<full-path-to-some-gitlab-repo>/issues/new?issue[title]=Issue%20with%20experiment%20%201
which was taken from GitLab official documentation.
As you can see here, there's an option to automatically fill the Title section directly from URL but I couldn't figure out how to do the same for Labels. Is there any way to do it this way?
For internal requirements, there must be a label automatically selected and the users can't be relied on to select it by themselves. For each webpage, a new label is assigned which makes it possible to extract all the issues related to that webpage later just by extracting all issues with that label. This might not be an optimal way to do this so if you have any other suggestions, please put them in comments. Thanks.

This does not seem yet supported, regarding pre-filling labels on issues.
That was requested in issue 63392, but without solution for now.

Web Scraping into Excel

I would like to create a spreadsheet that I can refresh and pull in each weeks English premier League fixtures, each week I would like to refresh this and see the weeks future fixtures. I have tried to use the import function from Data/From Web and selected the box with the table of fixtures however no data gets pulled into the spreadsheet.
The website I am using is - "http://data.7m.com.cn/matches_data/92/en/index.shtml"
I am open to understand a better way of doing this import and also if there is a better website to use I am also happy to change. I have chosen this one as it seems to have the most simplified listing of the fixtures.
I have also tried this website - https://www.premierleague.com/fixtures
When the import completes it actually skips all the fixtures and returns all the other information.
Should i be looking to some of the HTML elements within the script of the web page to extract the data?
For example on the following site - https://www.premierleague.com/fixtures I am looking for a file received by the website that updates the fixtures each week (after some direction from Google) I hit the F12 command and look within the "Network" tab however I cant understand how the website, this or the others quoted create the weekly fixtures.
Any suggestions on how to pull this into Excel or another tool would be fantastic.

Welcome to [so]! it sounds like you haven't done as much research as you could have. Your first link, in the top corner has links to "Free Feed" which take you to customizable widgets and from there is a link to a customizable live template.The first page also has a link to "Data" , I'm not sure what that consists of or whether it will help (since I'm not much of a sports fan on my continent, and even less on yours!
As for importing into Excel, I didn't have an issue with the table I could see, but once again I'm not clear on what data you're trying to get and what you want to do with it.
On the ribbon's Data tab click From Web.
Enter the first URL from your question and hit Enter
When the Navigator window loads, click "Table 1" and then click Load.
Below is what Excel then automatically loaded as a table:
If instead of clicking Load, you were to click Edit then you are brought into the Power Query Editor, where you can customizable tons of stuff. The one I was interested in was Use First Row on Headers. After choosing that, and clicking Close & Load, and 30 seconds of formatting later I had:
With Power Query you can choose, remove, split, or combine columns from this or other tables. It's fairly advanced but you should be able to find a good Power Query tutorial online, to see examples of what you can do, to learn about other ways you can customize the import and/or analysis of the data.
Edit:
More Information:
Here are the instructions for all versions:
Office Support : Connect to a web page (Power Query)

Is there a way to scrape the dataLayer available on the page instead of the regular HTML Elements in excel VBA?

I am trying to fetch some data from the web pages using Excel VBA and have been fairly successful.
However I have realized that most of the pages do have data-layer available on the page and thus if I am able to use the same, lot of effort can be reduced in massaging the data to bring it in usable format.
I tried to call the data layer by Document.getElement method but this seems not to be working.
I am not a hard core developer just can swim for my needs so please let me know if this is possible as all of my search results so far have yielded nothing.

I normally use SeleniumBasic (Excel Plugin) for WebScraping Needs. They way it can be done is by using
driver.ExecuteScript("return dataLayer[x].variableName")

View. Show values as Links. Strange behaviour

Xpage (listPostits.xsp) has a "View" container control, where one of the column is set "show values in this column as links".
Now, here comes "Strange behaviour".
When i work with this application on my own (developer) PC (Win XP, Chrome or IE), the Domino generate the link, which can't be really processed:
/servername/db/postit/postit.nsf/listPostits.xsp/onePostit.xsp?documentId=many_numbers&action=editDocument
Namely, the Bold-marked portion shouldn't be there ! This portion is the name of the XPage, where the View control is in.
When i work with the application from other PC (Mac, Firefox) then i get the correct link (the same as above but without the XPage name inbetween):
/servername/db/postit/postit.nsf/onePostit.xsp?documentId=many_numbers&action=editDocument
update: let us leave for the moment the differencies in generated links between two machines. The first question is - why the extra portion is inserted into automatically generated link?

After playing around i think i might have found the reason for this strange behaviour. Namely, the "Substitution" Rules on the server side. One of them is to substitute "*/postit/all" with "/db/postit/postit.nsf/listPostits.xsp"
If i switch it off, then the Links are generated properly. Still, it's pretty strange to me that these settings influence the way Domino generates the links. I thought it works on the fly with them and those settings have nothing to do with the way how Links are generated inside the application.
So, the help now is needed regarding Web Site Rule Topic, but for that, i guess, i have to create another topic. But in case somebody has some good Info on this, please share it with me. I'm a bit confused at the moment :)
Final Update: Spent some more hours of testing and the results confirmed the initial idea.
If i open the page with the standart URL, i.e.
http://servername/db/postit/postit.nsf/listPostits.xsp then everything is fine, links are generated properly. When i however open the same page with short URL http://servername/postit/all , then server adds the substitute URL (db/postit/postit.nsf/listPostits.xsp) to every single link he generates automatically to be used as the link to open/edit the underlying document.
Is it bug or feature ? Don't know.
As a workaround (because i want to keep simple URL's for the application) i have to manually generate links.

How to add text to any html element?

I want to add text to body element but I don't know how. Which method will work on the body tag?
Sorry for my english and thanks for replies.

In Watir, you can manipulate a web page (DOM) using JS, just like that:
browser.execute_script("document.getElementById('pageContent').appendChild(document.createTextNode('Great Success!'));")
I assume that the point of the question is:
All users are not just interacting by just clicking buttons and links on the web app, some of them are doing nasty things like altering http requests to make your system do something that it is not supposed to do... or to just have some fun.
To mimic this behavior, you could write a ui-test that alters forms on the web page, so that for example, one could type in anything into any field instead of a limited dropdown.
To do that, ui test has to:
manipulate DOM to set form inputs free of limitations (replace select's with input's, etc.)
ui test has to know, which values to use, in many cases it's pointless to enter random values. Your webapp has to provide some good "unwanted" options.

Why would you want to modify the webpage in Watir? It's for automated testing, not DOM manipulation.
If you want to add something to the DOM element in javascript, you can do it like that:
var txt = document.createTextNode(" This text was added to the DIV.");
document.getElementById('myDiv').appendChild(txt);
Or use some DOM manipulation library, like jQuery.

If you have not worked your way though the watir tutorial, I would suggest you do so. It deals with things like filling in text fields etc.
Learn to use the developer tools for your browser, Firebug for Firefox, or the built in tools for IE and CHrome. They will let you look at things as you interact with the site.
If the element is not a normal HTML input field of some sort, then you are dealing with a custom control. Many exist and they are varied and there is no one set solution for dealing with them. Without knowing which control you are using, and being able ourselves to interact with a sample of it, or at least see the HTML, it is very very difficult to advise you, we basically have to just guess (which is often a waste of everyone's time)
Odds are if you have a place you can enter text, then it is some form of input control, it might not start out that way, you may need to click on some other element, to make the input area appear, but without a sample of HTML all we can do is guess.
If this is a commercial control, see if you can find a demo site that shows the control in action. Try googling things like class names for the elements and often you get lucky

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string