Need to combine .getElementsByClass and .getElementsById to scrape data from website - excel

I am writing a simple crawler using VBA. I found out that the data I am looking for correspond to the node <h6 class="country-name" id="Australia">.
I know that if I wanted to select data from, let us say, <div class="section-country">, I should use .getElementsByClassName("section-country") in my VBA macro.
In presence of both class and id in the node, which command should I insert in my VBA macro to scrape data?
Thanks a lot,
Avitus
EDIT: writing .getElementsByClassName("country-name").getElementsById("Australia") gives me an error. Why?

getElementsByID (plural) doesn't exist - there should only be one item with a given ID. Therefore, use getElementByID (singular) which does exist. If there happen to be multiple elements with the same ID, this function will return the first one.
As others have said, selecting by ID sounds more appropriate to what you want to do than selecting by class

There must be a method like getelementbyxpath you can use this method by using this xpath "//div[#class='country-name' and #id ='Australia']"
eg: getElementsByXpath("//div[#class='country-name' and #id ='Australia']")
Read More here how to set up a crawler for web scraping

Related

Kentico 12 Azure Search

I'm trying to implement Azure Search on Kentico 12. Following the article below.
https://docs.kentico.com/k12/configuring-kentico/setting-up-search-on-your-website/using-azure-search/integrating-azure-search-into-pages
However, I have multiple indexes defined on the smart search not just a single index code name that I can hard code and also cannot aford to hard code index fields. Is there any tutorial out there that I can follow?
It sounds as if you're referring to building an Azure Search web part, is this correct. If so, make a property in your web part which allows you to select the code name from a list in the database. Secondly, regarding field names, you should be using generic field names like DocumentName, NodeAliaspath, etc. Although if you have very specific search results that need to be displayed, simply put in a switch statement to get the field names based on a class name.

search in divs within a DataTable

I'm currently on a project in which I included partialviews in every single row of a datatable.
it used to not be in a datatable at all, but I thought it could be a good idea for the use of the search function.
but of course, the partialview contains many text spans, even an external js called summernote, which includes lots of hidden text spans itself, and the search results are including every text within the global div. no need to say this isn't as accurate as I thought it would be.
I've seen that the search plugin could let us filter, for example, every div that has some class, but what I'm looking for is a functionnality that lets us filter on text contained in divs that have some class. this way I could ignore every other unrelevant text, hidden or not.
is it even remotely possible ?
thanks
There is no plain magic thing which let you filter a div containing a text combined with the question for a class which has to be also included. but you may write such a thing byself or use data attributes. Here is something which might be useful. filter multiple data attributes

Xpage dynamic Id for data update or validation(CSJS)

Before describing the problem, I would like to add that I have looked for this problem on google and have found many solutions but none related to XPAGES. Since, Xpage have a unique method to generate the field id I am facing this problem. I have also posted the same question here on IBM forum and awaiting the reply there(http://www-10.lotus.com/ldd/ndseforum.nsf/xpTopicThread.xsp?documentId=EDEBB14BDF2E804F85257BE8001E4F76):
Problem:
I am trying to pass dynamic id to the default function getElementById with no success. To explain it clearly, I have a front end validation for specific fields. This fields are stored n database. So, I have a for loop running for all the fields. The problem here is that XPages generates the Id dynamically and hence for the same form if there is a hierarchical tabbed panel then the Id also included the tabbed panel Id in it.
Here is the code view of the problem:
The standard method to retrieve the value(CSJS) is:
document.getElementById("#{id:inputText1}").value;
However, if I try to pass in a dynamic id. It doesn't work. I have tried "n" number of approaches I found on Google but none regarding this problem. One solution I tried here was:
var x = "inputText1";
document.getElementById("#{id:"+x+"}").value;
Any help would really be appreciated. Really eager to hear some good suggestion.
The "#{id:inputText1}" part is computed at the server before the page is served so it's too late to set the ID in client side JS.
To set the ID in SSJS you can do this:
document.getElementById("#{javascript:var x='inputText1'; getClientId(x)}").value;
With getClientId you can also build a CSJS array of IDs in in SSJS. Then you can loop that array in CSJS. You would build the array this way:
var strIDs = ${javascript:'["a","b","c"]'};

Alternative methods to search, without using FT Search

I am currently using the 'search in view results' option in the view control to provide the data set for my view (the reason for this is that the data set to be displayed is fairly complex depending on the user - and I was not able to accomplish this using vector filtering).
The problem I have with it, is that the search is a FT search, and that it does not let you search where a field is an exact match on a string, but rather it does a search where the field contains your string.
Does anyone know of an method where I can search the view for exact data?
Thanks in advance.
A
If your database is not too big you could use a database.search. It uses an #Formula to get the documents. It might be by a magnitude slower than FT Search
Take a look at this code http://openntf.org/XSnippets.nsf/snippet.xsp?id=build-a-search-query I think it could help you do what you are looking for.
Based on what you want to do, a better option is to create a hidden view with the columns you need to match on. Then search on that view rather then an FTI search.

Why the OnWorkflowItemChanged is different between List and document library?

I am doing a workflow for a document library. I put a OnWorkflowItemChanged, and I want to get the value of the column which is changed. I use the workflowProperties.Item["name"] and use the afterProperties. But when I use the workflowProperties.Item["column name"], I still got the original value. When I use the afterProperties, it's NULL.
Then I make another workflow that is the same as above for a list. I can use the workflowProperties.Item["column name"] to get the new value in OnWorkflowItemChanged.
Has anyone come across this problem before? Can you give me some help?
The question seems to mix up Item with ExtendedProperties. As to why a difference is seen on a List/Document Lib, it might have something to do with versionining or perhaps the internal serialization is different. Anyway, some of my experience is outline below. I hope it may be of use:
Use the GUID (as a Guid object, not a string) to access the Before / After ExtendedProperties field. Using the Display Name in the ExtendedProperties will not work. The documentation on it is wrong. You can use SPList.Fields to go from Display Name to Column ID (Guid).
I bind all "Before" to MyWhatever_PreviousProperties and all "After" to MyWhatever_Properties, only accessing MyWhatever_[Previous]Properties after the appropriate event(s)).

Resources