How to read real numeric values instead of formatted value using Apache XSSF POI streaming API? - apache-poi

I use streaming POI API and would like to read the real value of a cell instead of the formatted one. My code which is below works fine but if the user doesn't display all the digit of a value in the excel sheet which is readed by my code, I've got the same truncated value in my result. I didn't find any solution in the streaming API - which is needed in my case to solve memory issue I had using the POI API without streaming.
/**
* #see org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.SheetContentsHandler cell(java.lang.String,
* java.lang.String)
*/
#Override
void cell(String cellReference, String formattedValue, XSSFComment comment) {
useTheCellValue(formattedValue)
}

If you are constructing the XSSFSheetXMLHandler you can provide a DataFormatter. So if you are creating your own DataFormatter this DataFormatter could give you fully access to the formatting issues.
Example of how this could look like by changing the public void processSheet of the XLSX2CSV example in svn:
...
public void processSheet(
StylesTable styles,
ReadOnlySharedStringsTable strings,
SheetContentsHandler sheetHandler,
InputStream sheetInputStream) throws IOException, SAXException {
//DataFormatter formatter = new DataFormatter();
DataFormatter formatter = new DataFormatter(java.util.Locale.US) {
//do never formatting double values but do formatting dates
public java.lang.String formatRawCellContents(double value, int formatIndex, java.lang.String formatString) {
if (org.apache.poi.ss.usermodel.DateUtil.isADateFormat(formatIndex, formatString)) {
return super.formatRawCellContents(value, formatIndex, formatString);
} else {
//return java.lang.String.valueOf(value);
return super.formatRawCellContents(value, 0, "General");
}
}
};
InputSource sheetSource = new InputSource(sheetInputStream);
try {
XMLReader sheetParser = SAXHelper.newXMLReader();
ContentHandler handler = new XSSFSheetXMLHandler(
styles, null, strings, sheetHandler, formatter, false);
sheetParser.setContentHandler(handler);
sheetParser.parse(sheetSource);
} catch(ParserConfigurationException e) {
throw new RuntimeException("SAX parser appears to be broken - " + e.getMessage());
}
}
...

I've seen a ticket on POI about this point : https://bz.apache.org/bugzilla/show_bug.cgi?id=61858
It provides a first solution by changing the existing class.
This could be an interesting workaround even if the ideal solution should be to use a standard one.

Related

Spring + ibatis + String matching

I have a Spring application integrated with ibatis.
I am calling some third party application from where I am getting a String message (a message is combination of messages, there are Strings concatenated with \ delimiter to concatenate the different messages from the third party) as output.
I have to filter this output based on String matching. There are some 150 other Strings. If the output message contains any string out of 150 messages, i have to add some functionality.
I need suggestions to implement it. I am thinking to put 150 Strings in table as the count may increase in future. The Output may contain either no message out of this 150, or any number of combinations with these 150 messages.
I am new to Spring. please tell me how to get these messages from database, since i do not have an id to fetch them or shall I get all of them as list and then compare the output string from the third party. Also please tell me If it wise to keep these messages in database or I can keep them in some property file as well, which one will be better in performance.
Thanks in advance.
Ok, let's start with some possibilities:
IF you will only be adding a few messages in the future and only do so with new releases, then storing the messages in an enum would be a viable choice:
enum ErrorMessage {
SOME_MESSAGE("something, bla bla"),
SOME_OTHER_MESSAGE("something_else"),
...;
private String message;
private ErrorMessage(String message) {
this.message = message;
}
public static ErrorMessage getByErrorMessage(String message) {
for(ErrorMessage errorMessage: message) {
if (errorMessage.message.equals(message)) {
return errorMessage;
}
}
return null;
}
public static boolean exists(String message) {
return getByErrorMessage(message) != null;
}
}
Please note that this version is quite primitive and could be improved by adding all the messages into a static Set:
static Set<String> messagesCache = new Hashset<String>();
//in constructor:
messagesCache.add( message );
// better exists() method:
public static boolean exists(String message) {
return messagesCache.contains(message);
}
Or, as with other solutions, you could only store the actual hashcode of your strings. A hashcode is simple a numeric representation of your string and will be unique enough for you to identify it. Same solution as above:
static Set<String> messagesHashCodes = new Hashset<String>();
//in constructor:
messagesHashCodes .add( message.hashCode() );
// better exists() method:
public static boolean exists(String message) {
return messagesHashCodes .contains(message.hashCode());
}
(Of course, it would be a good idea to check for null values, etc.)
The enum version has one big advantage, if you want to have DIFFERENT actions taken for some of the actions, you can code them into the enum, for example...
SOME_MESSAGE_REQUIRING_AN_ACTION("...") {
#Override
public void doAction(StringBuilder finalString) {
...doSomething.
}
}
...
public void doAction(StringBuilder finalString) {
finalString.append( this.message );
finalString.append( SOME_SEPERATOR );
}
public void static doAction(StringBuilder builder, String errorMessage) {
if (exists(errorMessage)) {
}
}
In this example, you CAN override the doAction method in each enum value, if it should do more/something else than append the message to the StringBuilder. It would also be a nice touch to add some "NULL_MESSAGE" to the enum List that does nothing and is only there to allow easier handling:
UNKNOWN_MESSAGE(null) {
#Override
public void doAction(StringBuilder finalString) {
// do nothing here
}
}
public static ErrorMessage getByErrorMessage(String message) {
for(ErrorMessage errorMessage: message) {
if (errorMessage.message != null && errorMessage.message.equals(message)) {
return errorMessage;
}
}
return UNKNOWN_MESSAGE;
}
This way, you can simple give every single string into your enum method doAction(StringBuilder, String) and get the result: If a message fits, it is added (and some other action taken), if not, it will be ignored, null checks included.
On the other hand, if you messages change quite often, then you might not do a release for such a change but keep the values in a database. In this case, I would use the hashCode() of the message as an id (as I said, unique enough, typically) and load the whole thing into memory when the application starts, allowing you, for example, to build again a Set of hashcodes to compare your errorMessages' hashcodes against.
protected void init() {
// load all error Messages from the database
// put them into a Map<String, String> (hashCode -> Value) or even just a Set<String> (hashcodes)
}

Text representation of the content of a TableView

For testing purposes (using JemmyFX), I want to check that the content of a TableView is appropriately formatted. For example: one column is of type Double and a cell factory has been applied to show the number as a percent: 20%.
How can I verify that when the value is 0.2d, the cell is showing as 20%?
Ideally I am looking for something along those lines:
TableColumn<VatInvoice, Double> percentVat = ...
assertEquals(percentVat.getTextualRepresentation(), "20%");
Note: I have tried to use the TableCell directly like below but getText() returns null:
TableCell<VatInvoice, Double> tc = percentVat.getCellFactory().call(percentVat);
tc.itemProperty().set(0.2);
assertEquals(tc.getText(), "20%"); //tc.getText() is null
The best I have found so far, using JemmyFX, is the following:
public String getCellDataAsText(TableViewDock table, int row, int column) {
final TableCellItemDock dock = new TableCellItemDock(table.asTable(), row, column);
return dock.wrap().waitState(new State<String>() {
#Override public String reached() {
return dock.wrap().cellWrap().getControl().getText();
}
});
}
You can try editing the cell factory.
tc.setCellFactory(new Callback<TableColumn, TableCell>(){
#Override
public TableCell call(TableColumn param){
return new TableCell(){
#Override
public void updateItem(Object item, boolean isEmpty){
//...logic to format the text
assertEquals(getText(), "20%");
}
};
}
});

Display image in table

I am trying to insert an image into table view in JavafX. Here is how I set up my table view:
TableColumn prodImageCol = new TableColumn("IMAGES");
prodImageCol.setCellValueFactory(new PropertyValueFactory<Product, Image>("prodImage"));
prodImageCol.setMinWidth(100);
// setting cell factory for product image
prodImageCol.setCellFactory(new Callback<TableColumn<Product,Image>,TableCell<Product,Image>>(){
#Override
public TableCell<Product,Image> call(TableColumn<Product,Image> param) {
TableCell<Product,Image> cell = new TableCell<Product,Image>(){
public void updateItem(Product item, boolean empty) {
if(item!=null){
ImageView imageview = new ImageView();
imageview.setFitHeight(50);
imageview.setFitWidth(50);
imageview.setImage(new Image(product.getImage()));
}
}
};
return cell;
}
});
viewProduct.setEditable(false);
viewProduct.getColumns().addAll(prodImageCol, prodIDCol, prodNameCol, prodDescCol, prodPriceCol, col_action);
viewProduct.getItems().setAll(product.populateProductTable(category));
private SimpleObjectProperty prodImage;
public void setprodImage(Image value) {
prodImageProperty().set(value);
}
public Object getprodImage() {
return prodImageProperty().get();
}
public SimpleObjectProperty prodImageProperty() {
if (prodImage == null) {
prodImage = new SimpleObjectProperty(this, "prodImage");
}
return prodImage;
}
And this is how I retrieve the image from database:
Blob blob = rs.getBlob("productImage");
byte[] data = blob.getBytes(1, (int) blob.length());
bufferedImg = ImageIO.read(new ByteArrayInputStream(data));
image = SwingFXUtils.toFXImage(bufferedImg, null);
However I am getting error at the setting up of table view: imageview.setImage(new Image(product.getImage()));
The error message as:
no suitable constructor found for Image(Image)
constructor Image.Image(String,InputStream,double,double,boolean,boolean,boolean) is not applicable
(actual and formal argument lists differ in length)
constructor Image.Image(int,int) is not applicable
(actual and formal argument lists differ in length)
constructor Image.Image(InputStream,double,double,boolean,boolean) is not applicable
(actual and formal argument lists differ in length)
constructor Image.Image(InputStream) is not applicable
(actual argument Image cannot be converted to InputStream by method invocation conversion)
constructor Image.Image(String,double,double,boolean,boolean,boolean) is not applicable
(actual and formal argument lists differ in length)
constructor Image.Image(String,double,double,boolean,boolean) is not applicab...
I did managed to retrieve and display an image inside an image view but however, I can't display it in table column. Any help would be appreciated. Thanks in advance.
The problem that's causing the exception is that your method product.getImage() is returning an javafx.scene.Image. There's no need to do anything else at this point: You have an image, so use it (before you were trying to construct new Image(Image) - which is not even possible). This is what you want to be using:
imageview.setImage(product.getImage());
Your second problem is that while you're creating an ImageView every time you update the cell, you're not doing anything with it. Here's your original code:
TableCell<Product,Image> cell = new TableCell<Product,Image>(){
public void updateItem(Product item, boolean empty) {
if(item!=null){
ImageView imageview = new ImageView();
imageview.setFitHeight(50);
imageview.setFitWidth(50);
imageview.setImage(new Image(product.getImage()));
}
}
};
return cell;
Like #tomsontom suggested, I'd recommend using setGraphic(Node) to attach your ImageView to the TableCell. So you might end up with something like this:
//Set up the ImageView
final ImageView imageview = new ImageView();
imageview.setFitHeight(50);
imageview.setFitWidth(50);
//Set up the Table
TableCell<Product,Image> cell = new TableCell<Product,Image>(){
public void updateItem(Product item, boolean empty) {
if(item!=null){
imageview.setImage(product.getImage()); //Change suggested earlier
}
}
};
// Attach the imageview to the cell
cell.setGraphic(imageview)
return cell;
The first point #tomsontom was making is that your method of creating an Image is a little roundabout. Sure, it seems to work... but there's a simpler way. Originally you were using:
bufferedImg = ImageIO.read(new ByteArrayInputStream(data));
image = SwingFXUtils.toFXImage(bufferedImg, null);
But a better way of doing it would be switching those lines with:
image = new Image(new ByteArrayInputStream(data));
why are not creating the Image directly from the data new Image(new ByteArrayInputStream(data)) no need to rewrap it our use Swing stuff
I don't see a public Image(Object) constructor in FX8 - why passing it anyways if you are already have an image instance?
you need to set the ImageView on the cell with setGraphic()

GXT 3.x EditorGrid: choose cell editor type on a cell by cell basis

Is there anyway to define the editor type on a cell by cell basis in GXT 3.0?
I need to create a transposed table; the column become the row and the row is the column. That being the case, a column (from a normal table point of view) will have various editor type, whereby a row will have identical editor type.
I am trying to use following approach - It seems to be working fine, and allow to open up editors based on data type but when i click out; it doesn't close/hide editor.
I would really appreciate if someone can please point me in right direction.
final GridInlineEditing<MyModel> editing = new GridInlineEditing<MyModel>(mygrid){
#SuppressWarnings("unchecked")
#Override public <O> Field<O> getEditor(ColumnConfig<MyModel, ?> columnConfig) {
if(valueColumnName.equals(columnConfig.getHeader().asString())) {
MyModel myModel = tree.getSelectionModel().getSelectedItem();
if(MyModelType.STRING.equals(myModel.getMyModelType())) {
TextField textField = new TextField();
textField.setAllowBlank(Boolean.FALSE);
return (Field<O>) textField;
}
else {
TextArea textField = new TextArea();
textField.setAllowBlank(Boolean.FALSE);
return (Field<O>) textField;
}
}
return super.getEditor(columnConfig);
}
};
editing.setClicksToEdit(ClicksToEdit.TWO);
PS:
This is similar to question below; but answer is specific to post GXT 3.0. I am new to stackoverflow and it seems recommendation was to create new question instead of adding new post to old thread.
GXT EditorGrid: choose cell editor type on a cell by cell basis
After playing around all day; my colleague(Praveen) and I figured it out. So instead of trying to override GridInlineEditing's getEditor() method override startEditing() method. Also, you will need converters if you have data like Date, List etc. Below is sample code; hope this help others.
final GridInlineEditing<MyModel> editing = new GridInlineEditing<MyModel>(tree){
#Override public void startEditing(GridCell cell) {
MyModel myModel= tree.getSelectionModel().getSelectedItem();
if(MyModelType.TEXT.equals(myModel.getContextVariableType())) {
TextArea textField = new TextArea();
textField.setAllowBlank(Boolean.FALSE);
super.addEditor(valueColumn, textField);
}
else if(MyModelType.BOOLEAN.equals(myModel.getContextVariableType())) {
SimpleComboBox<String> simpleComboBox = new SimpleComboBox<String>(new StringLabelProvider<String>());
simpleComboBox.setTriggerAction(TriggerAction.ALL);
simpleComboBox.add("YES");
simpleComboBox.add("NO");
super.addEditor(valueColumn, simpleComboBox);
}
else if(MyModel.INTEGER.equals(myModel.getContextVariableType())) {
SpinnerField<Integer> spinnerField = new SpinnerField<Integer>(new IntegerPropertyEditor());
spinnerField.setIncrement(1);
Converter<String, Integer> converter = new Converter<String, Integer>(){
#Override public String convertFieldValue(Integer object) {
String value = "";
if(object != null) {
value = object.toString();
}
return value;
}
#Override public Integer convertModelValue(String object) {
Integer value = 0;
if(object != null && object.trim().length() > 0) {
value = Integer.parseInt(object);
}
return value;
}
};
super.addEditor(valueColumn, converter, (Field)spinnerField);
}
else {
TextField textField = new TextField();
textField.setAllowBlank(Boolean.FALSE);
super.addEditor(valueColumn, textField);
}
super.startEditing(cell);
}
};
editing.setClicksToEdit(ClicksToEdit.TWO);
I think the reason you are not seeing the fields not closing is because you are not actually adding them to the GridInlineEditing class.
In the parts where you have the following return statements;
return (Field<O>) textField;
Those textfields are never added to the grid.
I would try substituting the following code for your first two return statement;
super.addEditor(columnConfig, (Field<O>) textField;
This adds the editor to some maps used by AbstractGridEditing. Specifically, the AbstractGridEditing.removeEditor(GridCell, Field<?>) method, which is used in GridInlineEditing.doCompleteEditing() and GridInlineEditing.cancelEditing() needs the field to be in the map so it can be detached from its parent.

Is it possible to extract text by page for word/pdf files using Apache Tika?

All the documentation I can find seems to suggest I can only extract the entire file's content. But I need to extract pages individually. Do I need to write my own parser for that? Is there some obvious method that I am missing?
Actually Tika does handle pages (at least in pdf) by sending elements <div><p> before page starts and </p></div> after page ends. You can easily setup page count in your handler using this (just counting pages using only <p>):
public abstract class MyContentHandler implements ContentHandler {
private String pageTag = "p";
protected int pageNumber = 0;
...
#Override
public void startElement (String uri, String localName, String qName, Attributes atts) throws SAXException {
if (pageTag.equals(qName)) {
startPage();
}
}
#Override
public void endElement (String uri, String localName, String qName) throws SAXException {
if (pageTag.equals(qName)) {
endPage();
}
}
protected void startPage() throws SAXException {
pageNumber++;
}
protected void endPage() throws SAXException {
return;
}
...
}
When doing this with pdf you may run into the problem when parser doesn't send text lines in proper order - see Extracting text from PDF files with Apache Tika 0.9 (and PDFBox under the hood) on how to handle this.
You'll need to work with the underlying libraries - Tika doesn't do anything at the page level.
For PDF files, PDFBox should be able to give you some page stuff. For Word, HWPF and XWPF from Apache POI don't really do page level things - the page breaks aren't stored in the file, but instead need to be calculated on the fly based on the text + fonts + page size...
You can get the number of pages in a Pdf using the metadata object's xmpTPg:NPages key as in the following:
Parser parser = new AutoDetectParser();
Metadata metadata = new Metadata();
ParseContext parseContext = new ParseContext();
parser.parse(fis, handler, metadata, parseContext);
metadata.get("xmpTPg:NPages");

Resources