Automation within browser to export HTML elements in an XML file - browser

I have an online store on a marketplace that provides a back-office which features a section that lists all my customers. The problem is that I can't export their contact, I have to copy/paste their info one by one which can be very time consuming. So I was wondering if there was a way to automate this task within my browser. Since their contact is a HTML list, I'd like to target the specific tags and export them into a compiled XML file.
Is this possible at all?
[EDIT]
Thank you all for your input but it seems out of my reach in term of knowledge. So I've decided to hire a freelancer to perform this task for me.

Selenium might be an option but you didn't provide to many details about the html. Does the page has a rest api? If yes maybe you can call it as json and parse it.

Related

How to export my Hybris website?

I've read the hybris wiki but I wasn't able to find any related information...I have made a website in one hybris platform on localhost, and now I need to get all the content, products, pages, etc and import it in another platform. Do you have any idea how to do this?
Thanks!
You can export your data by writing Export Script.
Creating Headers for Export
Log into the hybris Management Console.
Navigate to System > Tools > Script Generator.
In the Wizard: Script Generator, click on the Advanced Settings tab.
Clear the Document ID check box.
Switch back to the Script tab.
Select [y] hybris MigrationScriptModifier from the Script modifier drop-down menu.
Click the Generate button.
The Script Generator writes the headers for all types in the hybris Commerce Suite into the Script field.
Copy and paste the list of headers into a text editor.
Remove the headers for the types you do not want to export.
Here are the list of some examples ->
Export CMS Content
Users and Addresses
Catalog Version Content
Classification System
For More Information on How to run Export in different ways, Follow this link Impex - User Guide
Quick and dirty db dump:
If you're not worried about overwriting existing data on the other platform, you could simply copy the database over. More than likely you will also need to copy the hybris/data folder, or you'll have missing media references. If you're using MySQL you can use the mysqldump utility. This is the quick easy way if you just need to stand up a UAT environment, for example.
Impex export
The better way would be by creating impex scripts that will export all the items needed. On the hybris wiki, search for "Data Exporter", which is a page containing a link to an HMC extension called advancedexport. It will allow you to set up your export more easily.
There are two scenarios :
1 - You have made all the website configuration by impex files and therefore you will just need to initialise your system on the new platform
2 - You have done all the website configuration manually
If you are running the exact same Hybris version and code, then exporting / importing the data is possible
If you are not running exactly the same platform, then it becomes tricky, you could either use import export functionality (good examples here) or try to generate Jinja templates for each item type (example here)
Another good option for you here will be to use the Hybris-to-Hybris Synchronization which will allow you to transfer all your data by taking advantage of Data Hub. This is a very good solution in cases where you want to keep a good performance in the origin system, or where you are dealing with different version of hybris. See more information on this here.

Scraping a website where all the data is locked in an XML database?

I am trying to download the full archive files of this website (http://www.afghanislamicpress.com/).
I tried using DeepVacuum (http://www.hexcat.com/deepvacuum/index.html) but the site is dynamic (I think that's the right word).
So you submit a form that gives the article archive, but it only spits out 5 at a time (i.e. per page) and then you have to click through. I want to download all the individual articles for the full data set, but don't want to manually click through.
I know there's some easy way to do this, but not entirely sure how.
Any suggestions for a novice at doing data scraping etc?
The most straightforward solution would be to contact the owner of the website and request their permission to republish their articles, and ask for a digital copy.
You can certainly automate pulling down content that is paged, but it requires some programming effort. The best tool for that imho is HTML Agility Pack.
Please be sure and comply with copyright and licensing terms of the content you are downloading.

How do I build custom workflows in SharePoint?

I need to learn how to build custom workflows in SharePoint. In addition to basic stuff like having legal sign-off on documents, I need to be able to execute arbitrary custom code at certain points. For example, after legal signs off it should export the document and update a database table indicating that a new version is ready.
Is this possible using SharePoint? If so, where can I find the documentation or tutorials I would need to get started?
Getting Started Link 1,Getting Started Link 2 links will help you to get started with the SharePoint Custom Workflows. And also it is possible to do any arbitrary code inside the workflow. Also there is another type of work flow called State machine for which there is a class one article
Workflow as the whole is built using sub parts called Activities which dictates what the workflow should do at a particular point of time, I remember you have an activity called Code Activity which will help you to achieve what you want.
Robert Shelton's Workflow Video Tutorial series is very helpful:
http://rshelton.com/Tags/Workflow/default.aspx

Has anybody ever tried to screen scrape data from sites built with SharePoint?

Or at least could anybody point me to docs about its crazy proprietary url parameters and html field name obfuscation? I can only suppose this is caused by SharePoint...
The main problem is, given a start page built with SharePoint, I can't recreate a form post with a programmative client because:
field names vary, they are appended with a some sort of id, hash, whatever (I think session.wise? Not sure)
tracing HTTP traffic on my side, I see the HTTP request is packed with strange parameters like __REQUESTDIGEST, __VIEWSTATE, and many others
Is this an intentional protection device put up by SharePoint? Which is the underlying architecture and which objects are involved (script callbacks, ... )?
(BTW, I'm not doing anything evil, just trying to extract public government data from a website).
Thanks.
SharePoint is nothing more than an ASP.NET Application, SharePoint completely Built on top of ASP.NET 2.0.
Being said that __VIEWSTATE is nothing but a Hidden Field that has the View State Information
Coming to __REQUESTDIGEST this is an Intentional Protection, this carries some sort of
securito validation which is called FormDigest
And finally to answer your Question, You will not be able to guess field and stuffs unless you have control to change the sourcecode of the application. Reason why the Name of the fields looks like obfuscated is because those controls are not handwritten but generated by the Code of ASP.NET Engine and parser, Reason field having such a name called Naming Container
One suggestion I would say is that, rather than trying to scraping the screen data, you can try alternate approaches, like each of the List in the SharePoint has the XML Feed inbuilt,try to consume it, if you have access to the site, try to retrieve the information using export to excel etc.
In addition to RSS, SharePoint also has a Web Services interface that you can use to get at and interact with data stored in SharePoint in a programatic way.

Sharepoint: Best way to display lists of non-Sharepoint content with "compatible" UI?

I've built a web part for Sharepoint that retrieves data from an external service. I'd like to display the items in a way that's UI-compatible with Sharepoint (fits in with its surroundings.)
I'm aware of the "DataFormWebPart" but was unable to get one working properly. It requires a valid DataSource and I was unable to build one from the results of a web service call... Part of the problem is that my web service wrappers don't expose the XML return info, rather I have a bunch of deserialized objects. There doesn't seem to be an easy way to turn actual objects into a datasource, or populate a "generic" datasource from object data.
I could use an SPGridView to get the same UI, but the grid control doesn't have much in the way of smarts -and- it forces every field into its own column. I'd prefer to render each list item as a single cell with complex rendering (for instance the way that StackOverflow shows its lists of questions.) I'd also like to get as much of the Sharepoint-standard UI as possible, such as the sorting, filtering, and paging controls.
So, first: Has anyone here written a Sharepoint control that does this, and if so do you have sample code to share? If not: am I overlooking some useful control, whether MS-supplied or available in an external library?
Thanks!
Steve
Sharepoint: Best way to display lists
of non-Sharepoint content with
“compatible” UI?
Take a look at the built in sharepoint web controls:
Microsoft.SharePoint.WebControls Namespace
It contains all the controls used in sharepoint. I'd tell you more, but the documentation is very thorough.
Problem with SharePoint is that there are a bunch of different ways to do this. If your data is not changing too often and is not overly large it may be worth considering entering it into a list for display.
If you have the Enterprise licence it may be worth getting your data into the BDC and using it there.
you may have to convert the objects into xml or use the serialised objects with the XML webpart for display. This still has the issue of custom rendering using XSLT.
Here's a great article that explains how to configure BDC connections to web services using the BDC Definition Editor:
Creating a Web Service Connection by Using the Business Data Catalog Definition Editor
http://msdn.microsoft.com/en-us/library/bb737887.aspx
The best way to do this IMO is to make a Web Part. As a Web Part the UI will be automatically rendered to be the same as the theme the site is using (unless you override it) and it will be able to be placed anywhere by anyone with admin privileges.
Tutorial on making a Web Part
Tutorial on packaging and deploying a Web Part
Example Web Part Source Code
You could create a custom web part and use an SPGridView. You say you don't like it, because it forces every field into its own column, but that's not true. You can create a template (ITemplate) for every column and fully customize what's shown inside it, just like you would using a normal ASP.Net GridView. Using this approach I've added the little "New" images right next to a list item's Title, just like SharePoint does itself.

Resources