How to Convert drawn shapes of HWPFDocument to XSL FO? - apache-poi

I am trying to convert .doc file to PDF,
For this I am initially trying to convert .doc > XSL-FO > PDF.
On Converting the .doc to XSL-FO I am unable to convert the drawn objects such as checkbox,rectangle,square to XSL-FO.
It gets converted as below , which should actually be a box
The conversion code I am using is
HWPFDocumentCore wordDocument = WordToFoUtils.loadDoc(is);
WordToFoConverter wordToFoConverter = new WordToFoConverter(
XMLHelper.getDocumentBuilderFactory().newDocumentBuilder().newDocument());
wordToFoConverter.processDocument(wordDocument);
File foFile = new File("D:\\Testing\\testing\\" + "test.fo");
ByteArrayOutputStream out = new ByteArrayOutputStream();
StreamResult streamResult = new StreamResult(out);
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.transform(new
DOMSource(wordToFoConverter.getDocument()), streamResult);
String result =
org.apache.commons.lang3.StringUtils.normalizeSpace(java.text.Normalizer.normalize(new
String(out.toByteArray(), "UTF-8"), java.text.Normalizer.Form.NFD));
result = URLEncoder.encode(result, "UTF-8");
Further Apache FOP is used to convert the .fo to pdf
The .doc file is as below
and the WordToFoConverter converted the boxes as below

In Plain Text like XML, check boxes usually come from basic symbol fonts.
They are seen / shown as ☐ when unchecked, or ☑ or ☒ when checked.
In any basic text stream it should be relatively easy to use or find and replace them. However beware the encoding especially UTF , thus best copied from a clean set of Zapf Dingbats or Adobe TTF Symbol font.
many have a Unicode description but do test visually that they work after copy and paste from the PDF since the font mapping may not always tally.
8999 ⌧ ⌧ \002327 0x2327 X in a rectangle box
By far the simplest way to use UniCode text is as Rich Text which you can on Windows Command Line (you don't need the lower left dialogue, its just to illustrate export settings) outPort as Port-AbleDocFile using Write.exe which can read TXT and /PrintTo PDF.
Its much simpler than XML where just one character requires:-
<w:rPr>
<w:rFonts w:ascii="Segoe UI Symbol" w:hAnsi="Segoe UI Symbol" w:cs="Segoe UI Symbol" w:eastAsia="Segoe UI Symbol"/>
<w:color w:val="auto"/>
<w:spacing w:val="0"/>
<w:position w:val="0"/>
<w:sz w:val="48"/>
<w:shd w:fill="auto" w:val="clear"/>
<w:vertAlign w:val="subscript"/>
</w:rPr>
<w:t xml:space="preserve">☑</w:t>

Related

Pdfbox-Android shows empty page

I recently used pdfbox android library because iText is under AGPL. I tried running following code.
PDDocument document = new PDDocument();
PDPage page= new PDPage();
document.addPage(page);
ByteArrayOutputStream outputStream=new ByteArrayOutputStream();
Bitmap bitmap=BitmapFactory.decodeFile(imagesObject.get(0).image); //imagesobject is string path
bitmap.compress(Bitmap.CompressFormat.PNG, 100, outputStream);
PDImageXObject pdImage = PDImageXObject.createFromFile(imagesObject.get(0).image,document);
PDPageContentStream contentStream = new PDPageContentStream(document, page,true,true);
contentStream.drawImage(pdImage,70,70,pdImage.getWidth(), pdImage.getHeight());
contentStream.close();
document.save(file);
document.close();
PDF is saved with empty page no image is shown. I noticed size of pdf is 6mb, Which means the image has been drawn but can't see. Any Fix?
Also I am using ported library by TomRoush.
This is the link for pdf that was generated here
As discussed in the comments, the image had a .jpg extension in the name, but was a PNG image file. The PDImageXObject createFromFile(String imagePath, PDDocument doc) method assumes the file type by its extension, so it embedded the file 1:1 in the PDF and assigned a DCT filter. Both of these would have been correct for a jpeg file, but not for png.
So the solution would be to either rename the file, or use the createFromFileByContent method.

Adding sticky note to existing PDF. losing original PDF content [duplicate]

I want to add Annotations comment in existing PDF file using iTextSharp with C#.
Please give sample code to add Annotations in existing PDF file.
Here PS Script for my Annotation:
[/Contents (My Text contents) /Rect [100 600 150 550] /SrcPg 1 /Title (My Title text) /Color [0 0 1] /Subtype /Caret /ANN pdfmark
The iText(Sharp) example TimetableAnnotations1.java / TimetableAnnotations1.cs from chapter 7 of iText in Action — 2nd Edition shows how to add annotations to existing PDFs in general.
The central code is (in the C# example):
rect = GetPosition(screening);
annotation = PdfAnnotation.CreateText(
stamper.Writer, rect, movie.MovieTitle,
string.Format(INFO, movie.Year, movie.Duration),
false, "Help"
);
annotation.Color = WebColors.GetRGBColor(
"#" + movie.entry.category.color
);
stamper.AddAnnotation(annotation, page);
where stamper is a PdfStamper working on your PDF file; movie is a data structure the example retrieves title, text and color of the annotation from.
PdfAnnotation offers multiple other Create... methods to create other types of annotations.
rect = GetPosition(screening);
can someone plz explain why is this is used..is there any way to find the current cursor position (top,bottom,height,width)
as with the annotation,
Document doc = new Document(PageSize.A4, 50, 50, 50, 50);
PdfWriter writer = PdfWriter.GetInstance(doc, new FileStream(#"C:\Users\Asus\Desktop\Test.pdf", FileMode.OpenOrCreate));
doc.AddDocListener(writer);
doc.Open();
doc.Add(new Annotation("annotation", "The text displayed in the sticky note", 100f, 500f, 200f, 600f));
doc.Close();
this works fine to me..

Can I retrieve the text color (or background color) of the character at the text cursor from QTextEdit?

I have a QTextEdit window with words and letters displayed in several colors. I want to be able to retrieve the color of each part of the text when processing the contents of the window. My attempt so far has been to save the entire contents as an html file and then parse through that to extract only the text with the color information. This is very cumbersome and difficult. I would much prefer to process the text using the QTextCursor if I could retrieve the color of the text at the cursor position. I have searched for the appropriate function but have not found one.
Is there a function to retrieve the color (or the format) at the QTextCursor position?
Or alternatively is there a way to retrieve each contiguous section of words and/or characters that have the same color (or format) with the format information?
Well I have found a way to do what I wanted. Here is the relevant code:
QTextCursor tc = qte->textCursor();
tc.movePosition(QTextCursor::Start, QTextCursor::MoveAnchor);
while(tc.movePosition(QTextCursor::NextCharacter, QTextCursor::MoveAnchor))
{
QTextCharFormat tcf = tc.charFormat();
int bg = tcf.background().color().rgb();
int fg = tcf.foreground().color().rgb();
printf("bg=%x fg=%x\n", bg, fg);
}
any comments or improvements are welcome.
[Corrected above]: I originally had
QColor bg = tcf.background().color().rgb();
QColor fg = tcf.foreground().color().rgb();
but with .rgb() on the end, it converts QColor to int.

Generate Arabic content with PDFKit & nodeJS

i'm using pdfkit with nodejs to generate dynamically PDF files. the generation works fine but i have a problem displaying arabic characters even if i setup a font that support arabic.
The letters are rendered correctly, but the words are displayed character by character :(
here's my code
doc = new PDFDocument;
doc.pipe(fs.createWriteStream('output.pdf'));
var str = "فصل الربيع الزهور \n#nature #payesage #fleurs #plantes #vert #espace #temara #rabat #maroc #WeekEnd #balade #instamoment #instalife #instamaroc #photographie #macro #peace";
doc.font('resources/HelveticaNeueLTArabic-Roman.ttf').text(str);
Any thoughts or suggestions will be great.
Use Amiri font , it supports arabic font
const customFontRegular = fs.readFileSync(`./amiri/Amiri-Regular.ttf`);
const customFontBold = fs.readFileSync(`./amiri/Amiri-Bold.ttf`);
pdfDoc.registerFont(`Amiri-Regular`, customFontRegular);
pdfDoc.registerFont(`Amiri-Bold`, customFontBold);
And it can be used as
pdfDoc.font('Amiri-Regular').text("Hello world");

How to print R graphics to multiple pages of a PDF and multiple PDFs?

I know that
pdf("myOut.pdf")
will print to a PDF in R. What if I want to
Make a loop that prints subsequent graphs on new pages of a PDF file (appending to the end)?
Make a loop that prints subsequent graphs to new PDF files (one graph per file)?
Did you look at help(pdf) ?
Usage:
pdf(file = ifelse(onefile, "Rplots.pdf", "Rplot%03d.pdf"),
width, height, onefile, family, title, fonts, version,
paper, encoding, bg, fg, pointsize, pagecentre, colormodel,
useDingbats, useKerning)
Arguments:
file: a character string giving the name of the file. For use with
'onefile=FALSE' give a C integer format such as
'"Rplot%03d.pdf"' (the default in that case). (See
'postscript' for further details.)
For 1), you keep onefile at the default value of TRUE. Several plots go into the same file.
For 2), you set onefile to FALSE and choose a filename with the C integer format and R will create a set of files.
Not sure I understand.
Appending to same file (one plot per page):
pdf("myOut.pdf")
for (i in 1:10){
plot(...)
}
dev.off()
New file for each loop:
for (i in 1:10){
pdf(paste("myOut",i,".pdf",sep=""))
plot(...)
dev.off()
}
pdf(file = "Location_where_you_want_the_file/name_of_file.pdf", title="if you want any")
plot() # Or other graphics you want to have printed in your pdf
dev.off()
You can plot as many things as you want in the pdf, the plots will be added to the pdf in different pages.
dev.off() closes the connection to the file and the pdf will be created and you will se something like
> dev.off()
null device 1

Resources