I am working on a project to analyze HTML code from specific websites. I can use phantomjs to get the source of a single website, but what I'd like to do is be able to get the source for many websites using a search term.
Say I search "Music CD's" on Google. I'd like the application to click on a result, save the source then click on the next result and save the source. Also I'd like it to be able to navigate to the next page and do the same on all subsequent pages.
Any advice on tools that may help with this, or guidance on how I can program it myself would be greatly appreciated.
Related
I would like to be able to print the Criteria information that I used in the footer of the PDF of a Saved Search. I have tried looking at making an Advanced PDF to handle this, but I cannot find the field to pull in the criteria information into either the Saved Search or the Advanced PDF. Any assistance would be greatly appreciated.
Edit: I have created a Saved Search in NetSuite that displays all of the Inventory Receipts made the previous week. We have to print this Saved Search and check it against the actual paper receipts to verify counts and receipts. When the Saved Search prints to PDF it does not show the criteria information for which the Saved Search was run to prove to Internal Audit that we have run this report for the correct dates. I would like a way to print the Criteria information for this Saved Search along with a timestamp of when the Saved Search was run and the users that ran the Saved Search. Is there a way to pull in this additional information in a Saved Search or Advanced PDF somehow?
add a custom Print button
button goes to a suitlet that renders the PDF
before you render, load the search and pass the search.filters in to addCustomDataSource() api of the render module
Ok, I read your edit and comment with regard to my comment. There's three big things here.
I don't think you should pursue a dev route with this. You can schedule the report for automated email once a day and prove that it came from a saved search which is the same saved search every time. This will save you an infinite amount of hassle
Developing this as your first script is going to be hard. I'm happy to help. But when I tell you it's going to be a lot of code, I mean it. See this old post I did https://stackoverflow.com/a/61066928/11323304
If you still want to pursue a dev route with this (which is totally fine), start with emulating the user event code on a custom suitelet like I posted above in my answer. You're going to need N/serverWidget/ui N/search and N/xml. The rest is all in the UserEvent functions and global context variable.
If all this still goes well over your head. Don't sweat it. Comment back, and we'll build something step by step. But, I highly, highly, highly encourage you to check out the automated email capabilities of NetSuite before trying to develop something special.
I'm trying to collect a list of "https://..." and hope to store them in csv file. I can do them manually such as use excel, copy the urls from the website of interest and paste them one by one. But it's tedious and definitely would take lot of time.
can someone suggest and guide for a faster way?
If you just need the addresses quickly from one page you could run this javascript snippet document.links.forEach(link=>console.log(link.href)) in the console of your browser, this will output all of the links on that page.
If you want to use python to scrape the page I would suggest taking a look at this question on stackoverflow, this uses the beautifulsoup framework.
If there is dynamic content loaded on the page with javascript it's probably better to use something like Selenium, relevant stackoverflow question
I want to get a report which specifies what all links are there in each page of the website.I tried using different softwares,but the problem is they are just giving all links without showing exactly which links are there in each page.Also the website i am trying to make a report on is very unstructured,so it's not possible to just classify links,based on url forward slashes.For example,links starting with https://example.com/blog, will not give me all links inside the
'https://example.com/blog' page,because links inside 'https://example.com/blog' page can contains links without 'https://example.com/blog/' in the beginning of the link.
What can i do about this?
Thanks.
In Google analytics, there is no such concept as the next page.
Rather, it only knows the previous page.
It is due to the disconnected nature of the web.
You can, however, use the previous page to trace back to get the data you want.
Instead of looking for all links inside the https://example.com/blog, you will be looking at getting all links where the previous page is https://example.com/blog
More detailed explanation
On the website I run we have a single search field where you can enter a name or profession. When you search you are served with a page full of results that come from 3 seperate sources.
Once you click on one field e.g. John Do, you will be taken to his page. On that page we have a back to search, but it goes to a blank screen.
I want to go back to the actual search results so the person doesn't have to do it all again, but I'm not sure where to start. Any suggestions?
That's a tricky situation.
There can be many solutions for this issue but I'll will name some of them.
Activate the cache of the pages (Quick trick, no suitable for websites that relies on users (*login)), you can go back and your form will be the same with the results without any issue.
Manage the load of the page of Jhon Do as a ajax load and #hashtag references, you don't reload the page but you just manage the states of the HTML. (Can be done with JS frameworks or React)
Depends on which platform are you working try to manage the variable of the search with this concept post-redirect-get
Hope that is helps!
Cheers.
Office JS API for OneNote... Love it, but I am missing some critical things. Can someone comment?
I got a paragraph, type is RichText. But, I could not find in the API the style of the rich text. In my case, I want to know if it is a Header 1,2,3... or Quote, etc.
Same-page linking: In OneNote desktop I can right click any text and copy link to that specific paragraph. Clicking that link later will take me directly to that paragraph. However, I did not find an API that can navigate directly to a paragraph, the only one I could find navigates to a page: navigateToPage(page: Page) and navigateToPageWithClientUrl(url: string)
If that even possible? Also, I noticed these links don't work at all in the web version of OneNote, but that's a different story I guess.
I am building (a free!) TOC add-in, you can put at the top of your page and will potentially show all headers with links to the header in the page. However, the lack of the above capabilities make it impossible for such a simple add-in to work (or, at least I thought it is a very basic and simple one...)
Any help will be greatly appreciated!!! Like I said, if I get these 2 issues resolved - the add-in will be available for free.
https://dev.office.com/reference/add-ins/onenote/paragraph?product=onenote
Sounds like a cool add-in!
You can use the getHtml method on richText to get the style. There is an example in this answer.
OneNote Add in: Getting HTML content
As for creating links to a specific paragraph, OneNote add ins do not expose the capability of doing that. You can add a request in our uservoice. The only supported capability is to navigate to a page.
https://onenote.uservoice.com/forums/245490-onenote-developer-apis
As for links that work in OneNote online, the "webUrl" property in a page will contain a link that works in OneNote online.
https://github.com/OfficeDev/office-js-docs/blob/master/reference/onenote/page.md
Thanks for feedback. We will update the documentations.
There is currently no way to scroll to any region in the page.