I'm calling SchemaCrawler in the following way:
call java -classpath ../_schemacrawler/lib/*;lib/* schemacrawler.Main -server=mysql -database=db_db -host=localhost -user=user -password=pwd -infolevel=maximum -command=brief -portablenames=false -tabletypes=TABLE -routines=.*\.X.*.* -routines=.*\.X.*.* -outputformat=html -o=html.html %*
It generates a nice html output. But I would like to see the table COMMENT text. It appears for the case of columns but cannot find a way to see the same for tables.
I guess it's related to the -noremarks options but I have already tried it without success.
How should I proceed?
Related
I'm wondering if it's possible to extract the table that results when running ANOVA to an Excel or .csv file. I'm running a repeated measures two-way ANOVAs with RMAOV2 (http://uk.mathworks.com/matlabcentral/fileexchange/5578-rmaov2). Here is the code I'm using, which works fine, and it produces a table with the ANOVA results.
dir ='/Users/Documents/folder';
cd(dir)
file = readtable('file.csv');
toAnalyse = table2array(file);
RMAOV2(toAnalyse);
However, when I tried to save the ANOVA results in order to then export them to Excel or in a .csv file, this doesn't work:
ANOVAresults = RMAOV2(toAnalyse);
Error:
Output argument "RMAOV2" (and maybe others) not assigned during call to "RMAOV2".
Any suggestion would be very appreciated.
If you take a look into the source code of the file, you will notice that it never assigns anything to the return variable. Instead it only prints data to the command window.
To resolve this problem you have to edit the source code and assign the data you want to return. Alternatively you can contact the Autor.
I'm trying to figure out a good way to increase the productivity of my data entry job.
What I am looking to do is come up with a way to scrape data from a PDF and input it into Excel.
More specifically the data I am working with is from grocery store flyers. As it stands now we have to manually enter every deal in the flyer into a database. A sample of a flyer is http://weeklyspecials.safeway.com/customer_Frame.jsp?drpStoreID=1551
What I am hoping to do is have columns for products, price, and predefined options (Loyalty Cards, Coupons, Select Variety... that sort of thing).
Any help would be appreciated, and if I need to be more specific let me know.
After looking at the specific PDF linked to by the OP, I have to say that this is not quite displaying a typical table format.
It contains many images inside the "cells", but the cells are not all strictly vertically or horizontally aligned:
So this isn't even a 'nice' table, but an extremely ugly and awkward one to work with...
Having said that, I'll have to add:
Extracting even 'nice' tables from PDFs in general is extremely difficult...
Standard PDFs do not provide any hints about the semantics of what they draw on a page:
the only distinction that the syntax provides is the distinctions between vector elements (lines, fills,...), images and text.
Whether any character is part of a table or part of a line or just a lonely, single character within an otherwise empty area is not easy to recognize programmatically by parsing the PDF source code.
For a background about why the PDF file format should never, ever be thought of as suitable for hosting extractable, structured data, see this article:
Why Updating Dollars for Docs Was So Difficult (ProPublica-Website)
...but doing so with TabulaPDF works very well!
Having said the above now let me add this:
For an amazing open source family of tools that gets better and better from week to week for extracting tabular data from PDFs (unless they are scanned pages) -- contradicting what I said in my introductionary paragraphs! -- check out TabulaPDF. See these links:
Introducing Tabula: Upload a PDF, get back tabular CSV data. Poof!
Tabula-Extractor: A Command Line Interface to Tabula
Tabula source code repository
Tabula API (upcoming, not ready yet)
Tabula-Extractor is written in Ruby.
In the background it makes use of PDFBox (which is written in Java) and a few other third-party libs.
To run, Tabula-Extractor requires JRuby-1.7 installed.
Installing Tabula-Extractor
I'm using the 'bleeding-edge' version of Tabula-Extractor directly from its GitHub source code repository.
Getting it to work was extremely easy, since on my system JRuby-1.7.4_0 is already present:
mkdir ~/svn-stuff
cd ~/svn-stuff
git clone https://github.com/tabulapdf/tabula-extractor.git git.tabula-extractor
Included in this Git clone will already be the required libraries, so no need to install PDFBox.
The command line tool is in the /bin/ subdirectory.
Exploring the command line options:
~/svn-stuff/git.tabula-extractor/bin/tabula -h
Tabula helps you extract tables from PDFs
Usage:
tabula [options] <pdf_file>
where [options] are:
--pages, -p <s>: Comma separated list of ranges, or all. Examples:
--pages 1-3,5-7, --pages 3 or --pages all. Default
is --pages 1 (default: 1)
--area, -a <s>: Portion of the page to analyze
(top,left,bottom,right). Example: --area
269.875,12.75,790.5,561. Default is entire page
--columns, -c <s>: X coordinates of column boundaries. Example
--columns 10.1,20.2,30.3
--password, -s <s>: Password to decrypt document. Default is empty
(default: )
--guess, -g: Guess the portion of the page to analyze per page.
--debug, -d: Print detected table areas instead of processing.
--format, -f <s>: Output format (CSV,TSV,HTML,JSON) (default: CSV)
--outfile, -o <s>: Write output to <file> instead of STDOUT (default:
-)
--spreadsheet, -r: Force PDF to be extracted using spreadsheet-style
extraction (if there are ruling lines separating
each cell, as in a PDF of an Excel spreadsheet)
--no-spreadsheet, -n: Force PDF not to be extracted using
spreadsheet-style extraction (if there are ruling
lines separating each cell, as in a PDF of an Excel
spreadsheet)
--silent, -i: Suppress all stderr output.
--use-line-returns, -u: Use embedded line returns in cells. (Only in
spreadsheet mode.)
--version, -v: Print version and exit
--help, -h: Show this message
Extracting the table which the OP wants
I'm not even trying to extract this ugly table from the OP's monster PDF. I'll leave it as an excercise to these readers who are feeling adventurous enough...
Instead, I'll demo how to extract a 'nice' table. I'll take pages 651-653 from the official PDF-1.7 specification, here represented with screenshots:
I used this command:
~/svn-stuff/git.tabula-extractor/bin/tabula \
-p 651,652,653 -g -n -u -f CSV \
~/Downloads/pdfs/PDF32000_2008.pdf
After importing the generated CSV into LibreOffice Calc, the spreadsheet looks like this:
To me this looks like the perfect extraction of a table which did spread over 3 different PDF pages. (Even the newlines used within table cells made it into the spreadsheet.)
Update
Here is an ASCiinema screencast (which you also can download and re-play locally in your Linux/MacOSX/Unix terminal with the help of the asciinema command line tool), starring tabula-extractor:
I've been trying to get started with Cucumber and Watir. I've done the installation and written/copied a simple 'feature'. However, I'm not getting anywhere with checking what's on a page.
This my feature:
Feature: Search Google
In order to make sure people can find my documentation
I want to check it is listed on the first page in Google
Scenario: Searching for JS.Class docs
Given I have opened "http://www.google.com/"
When I search for "JS.Class"
Then I should see a link to "http://jsclass.jcoglan.com/" with text "JS.Class v3.0"
My env.rb file looks like this
require 'watir'
require 'rspec'
And this is my search_steps.rb
#ie
Given /^I have opened "([^\"]*)"$/ do |url|
#ie = Watir::IE.new
#ie.goto(url)
end
When /^I search for "([^\"]*)"$/ do |arg1|
#ie.text_field(:name, "q").set(arg1)
#ie.button(:name, "btnG").click
end
Then /^I should see a link to "([^\"]*)" with text "([^\"]*)"$/ do |arg1, text|
puts arg1
# #ie.text.should include(arg1)
#ie.close
end
When I uncomment the line '#ie.text.should include(arg1)' I get this error thrown.
Then I should see a link to "http://jsclass.jcoglan.com/" with text "JS.Class v3.0" # features/step_definitions/search_steps.rb:13
true
undefined method `split' for ["http://jsclass.jcoglan.com/"]:Array (NoMethodError)
./features/step_definitions/search_steps.rb:17:in `/^I should see a link to "([^\"]*)" with text "([^\"]*)"$/'
features\search.feature:8:in `Then I should see a link to "http://jsclass.jcoglan.com/" with text "JS.Class v3.0"'
Failing Scenarios:
cucumber features\search.feature:5 # Scenario: Searching for JS.Class docs
Right now the Then function isn't doing its job. If I comment out the line then all it's really doing is writing out some text.
The gems I have include the following cucumber <1.1.9>, rspec <2.9.0>, watir<2.0.4> and watir-webdriver <0.5.3>
The version of ruby I have is: ruby 1.9.3p125.
I'm running this on Windows 7 Ultimate.
I'm wondering if I'm missing a gem or something. Reading the message implies that I'm calling a method it can't find but I've found quite a few pages on the web that use this method.
Any help, guidance or pointers in the right direction are gratefully welcomed.
The problem is that if you (manually) look at the text of the Google search results, you will noticed that there is no text "http://jsclass.jcoglan.com/". So, as expected, the test will fail.
To fix this, you can correct the call to the step by removing the 'http://':
Then I should see a link to "jsclass.jcoglan.com/" with text "JS.Class v3.0"
That said, it would be more robust if you actually make check for the link using the following code. Checking that the text is somewhere on the page can lead to false positives (see the latest post on WatirMelon)
#ie.link(:url => arg1, :text => text).exists?.should be true
Hallo,
I wanted to ask how I can join three different pdf documents so that it appears in one single appendix. The command I gave was:
See Appendix~\ref{sec:corr-1}~\ref{sec:corr-2}~\ref{sec:corr-3}
and I have the following on my appendixes list:
\subsubsection{Writing}
\label{sec:writing}
\input{corr-1.tex}
\input{corr-2.tex}
\input{corr-3.tex}
Unforunately I cannot compile the final document.
Thanks a lot in advance.
Have a nice day.
Marie
In your example you just include tex files.
Have a look at the package pdfpages: http://mirror.ox.ac.uk/sites/ctan.org/macros/latex/contrib/pdfpages/pdfpages.pdf
Add to the top of your document:
\usepackage{pdfpages}
You should call this in the text where you want the PDF to appear:
\includepdf{test.pdf}
When I run sp_helpdb dbname in Sybase Adaptive Server Enterprise, it returns only the following columns:
name,db_size,owner,dbid,created,status
And it's not returning the following columns:
device_fragments,size,usage,created,free kbytes
Why is this happening?
Both sets are returned however where they are displayed depends on which tool you're using to run the query. If you're using SQL Advantage or ASEISQL, then you need to look in the results and the messages windows to get the full answers. If you're using the command line ISQL then all will be returned together.
It's because some of the results are returned from a select, and some from print messages.
print "Print hello"
select "Select hello"
Try running the above and you'll hopefully find where each different output is displayed in your tool.
If you're using SQL Advantage see the SQL Advantage image here, this shows the options screen in which you can change how your results return. The "Display Print Messages with Results" might help in this case.