Data Scraping from PDF and Excel [closed] - excel

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am doing a little data scraping, There are 3 types of file from which i am scraping data.
1- HTML
2- PDF
3- Excel(xls)
For HTML i am comfortable, i am using HTML Agility for that.
For PDF and excel i need suggestions from anyone.
Thanks in advance.

Concerning Excel. If you are in a MS environment you can either do Office Automation or use OLEDB. In a Java environment look at Apache POI.
EDIT: Concerning PDF in Java try Apache PDFBox . Can also work in .NET using IKVM

I can recommend Cogniview's PDF2XL, a reasonably inexpensive commercial product, to extract data from tables in PDF files into Excel. We have used it with great success.

HTML Agility is a library. Its good to use. But then, why do you need separate tools for different data extraction purposes? Use Automation Anywhere to extract data from any source. As far as I know, it would work for all the three sources you have specified.
Google it.

You can use UiPath in order to achieve this. It can scrape 100% accurate PDF, Excel, HTML, Java, Windows, .NET, WPF, legacy. Also works with virtualized based environments but only via OCR scraping.
Can be used from code (SDK) but also you can create visual automation (workflows) using UiPath Studio.
Here's a tutorial on web data extraction
Note: I work at UiPath so I know it can do the job. You should also try other visual automation tools like Automation Anywhere, WinAutomation, Jacada, use them side by side and choose the one that suits you the best.

Related

How to add preview functionality for markdown/LaTex code in ReactJS application

Recently I have create a fullstack bloging website here. For writing blog I used react-quill because its extensible architecture and an expressive API, I can completely customize it to fulfill my needs. But Now I feeling that I need something which can handle markdown and LaTeX also. I was wondering is there any library which can handle both or at least LaTeX one with an expressive API?
And one more question, How stackoverflow give preview system(live preview of our markdown/LaTex code) when we ask a question? Because it's a really an important functionality when you writing a blog. It will be great help if anyone let me informed anything related with this.
I found react-latex and react-markdown to handle markdown and LaTex seperately. But still hanging on how to setup preview functionality. Any idea will be appreciated.
Input the text to react-markdown and preview the output in a div element.
You can check the source code of some open-source online markdown editor. They support latex also.
Reference:
stackedit

Ext.Net brief introduction along with advantages and disadvantages [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am new to Ext.Net can any one guide me what is Ext.Net and its advantages, disadvantages why we go for this?
thanks in advance.
Ext.NET (known as Coolite until November 2010) is a suite of
professional ASP.NET AJAX Web Controls which includes the Sencha ExtJS
JavaScript Framework.
Source
Take a look at their examples and you will see that ext.net provides a quicker way to develop web apps.
WHAT is Ext.NET?
Ext.NET is an advanced ASP.NET (WebForms + MVC) component framework
integrating the cross-browser Sencha Ext JS JavaScript Library.
WHO is it for?
If you are looking to build a rich, modern web application with
cutting edge web technologies, unparalleled cross-browser
compatibility and an advanced MVC architecture then Ext.NET is for
you!
WHY should you use it?
Ext.NET is built for developers, by developers. We provide hundreds of
Demos in our Examples Explorer. Need a little support? Check out our
Developer Bundles.
Source

Full-stack NodeJS framework? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
Is there a full-stack, NodeJS-based framework similar to Ruby on Rails or PHP which renders templates on the server-side?
Basically, I am wanting to develop a web site which will be indexable by Google (non-SPA). I'd like to be able to include common header and footer files on each page. I basically want to do the following but with NodeJS:
index.html:
<?php include 'header.php'; ?>
<h1>This is the home page</h1>
<p>Here is some content.</p>
<?php include 'footer.php'; ?>
I will not be using a RESTful API (or any API) for this web site. It's basically a simple, static web site which uses NodeJS for its server-side component.
I would suggest that docpad is the best solution at the moment for what your looking for. It allows you to setup a set of templates, content and styles which generate a static site for you.
You select what templating engine that you would like to use based on a plugin system. The tutorial (which I followed recently) explains very easily how to do what you would like to do. The tutorial uses eco templating engine. Now I knew nothing about eco and was able to follow and figure out some tricky requirements of my own without too much trouble.
If you go this route, then I also suggest the partial plugin, which is really nice for inserting bits into other bits.
Start here on how to use. It takes you through everything that you could need to know for using it.
I also suggest installing node.js as per these instructions.
If you have experience with Backbone.js, Rendr.js shows some promise.
Your best bet is to just use one of the simple frameworks like Express http://expressjs.com/, along with a simple templating engine like Handlebars.
This module is then the one you need to add to the project:
https://github.com/ericf/express-handlebars
The readme of this is excellent, and has some fully working examples showing your two options:
Use a global layout file (this will have your header and footer in it).
Use partials - similar to the php example above
Keep in mind that the default Node world is to do server side rendering,just the same as PHP, but in JS. The frameworks like rendr are really trying to do something much more complex, and share rendering between server and client - you don't need this if you are just building a simple web site.
Best of luck.

How do I learn Cognos by myself without data to work on?

I have done a course in data warehousing and data mining, and I am interested in going the Cognos business intelligence way. However, I need to practice on data which I cannot find. How do I get myself aquainted better with Cognos? I need sample data to work on. Where do I find it ?
Try http://www.data.gov. They have tons of large datasets of all shapes and sizes.
There's also the Stack Overflow torrent...
Microsoft has a great set of reusable sample data for SQL Server. It's much better than the sample gobbledygook that comes with Cognos.
http://msftdbprodsamples.codeplex.com/
You can use one of the free versions of SQL Server with Cognos.
I'm assumming you have access to the product. The absolute best way to learn the product is to leverage the Cognos Examples that are an optional component to install(Comes with Cognos).
The examples not only install the GoSalesDb which is their sample data but the examples also have best practices Framework Models/Cubes/Dashboards/Reports built on this data so you can see how the tool is meant to function.
Everything Cognos/IBM trains on almost for power users/content will use this same sample engine for the most part to teach from.
The examples use to be a joke in early releases but now they are seriously polished and make it easy to not only learn the product but see techniques and approaches that have taken people years to learn.
When the examples are installed you will see in the Cognos Web Portal all this content (Packages/Reports/Cubes/Dashboards/Metric Reprots/Alerts) everything all pre-built and ready for you to run/review and study how it was done and you have the GoSales datasource to use as your playground.
Look in the samples directory of the Webcontent folder. Or have your Cognos administrator set it up.
Why not just use the sample databases that come WITH Cognos 8? If you have access to a Cognos 8 BI installation, you can install the Great Outdoor sample databases which also come with Framework Manager (FM) models and packages for Cognos 8 deployment.
Instructions on installing the samples can be found on the IBM site: Install the IBM Cognos 8 Samples .

What tool visually maps out existing websites?

I used a tool a few months ago that scanned a specified website and created a visual hierarchy of the website's page links. It also represented each page with its appropriate screenshot.
Does anyone know what tool this is? Or maybe something that performs the same basic features?
Scratch that, I found it! InfoRapid's Knowledge
It produces visual sitemaps like this one of Google:
You can use :
Visio
Front page
there is also a similar question here

Resources