Store large hidden text/string to a PDF using iTextSharp

Store large hidden text/string to a PDF using iTextSharp - string

I want to store a large string in PDF document somewhere hidden. Right now I have a hidden text field in which I am writing that text. The problem is that when the string size increased upto 10MB I start getting OutOfMemory errors.
What will be the best way to store some large hidden string/text to PDF document using iTextSharp? That text/string should be retrieved later as well.

Such private data can be stored in PieceInfo dictionaries, also cf. David's answer to the OP's follow-up question.
This answer to the older question "Insert hidden digest in pdf using iText library" shows how to make use of PieceInfo dictionaries in general using iText/Java (differences to iTextSharp/C# should be minimal here).
As the OP talks about data 10 MB and up, he may want to use PDF streams instead of strings.
The DocumentPieceInfo helper class provided in that older answer can be used with PDF streams for BIG DATA like this (again in Java as I'm mostly living on the Java side, and again porting to C# should be easy):
Storing document PieceInfo data
PdfName appName = new PdfName("MYAPP");
PdfName dataName = new PdfName("BigData");
DocumentPieceInfo dpi = new DocumentPieceInfo();
PdfReader reader = new PdfReader(...);
PdfStamper stamper = new PdfStamper(reader, ...);
InputStream in = ... BIG DATA INPUT STREAM ...;
PdfStream stream = new PdfStream(in, stamper.getWriter());
stream.flateCompress();
PdfIndirectObject ref = stamper.getWriter().addToBody(stream);
stream.writeLength();
in.close();
dpi.addPieceInfo(reader, appName, dataName, ref.getIndirectReference());
stamper.close();
Retrieving document PieceInfo data
PdfName appName = new PdfName("MYAPP");
PdfName dataName = new PdfName("BigData");
DocumentPieceInfo dpi = new DocumentPieceInfo();
PdfReader reader = new PdfReader("target/test-outputs/test-with-piece-info.pdf");
PdfObject myDataObject = dpi.getPieceInfo(reader, appName, dataName);
myDataObject = PdfReader.getPdfObject(myDataObject);
byte[] myData = PdfReader.getStreamBytes((PRStream)myDataObject)

Related

Paragraph styles breaking formatting in poi pdf converter

I'm using Poi/Xdocreport pdf converter on a Tomcat server. Very simple implementation in code:
InputStream doc = new FileInputStream(new File(documentPath));
XWPFDocument document = new XWPFDocument(doc);
PdfOptions options = PdfOptions.getDefault();
File pdfFile = new File(pdfFilePath);
OutputStream out = new FileOutputStream(pdfFile);
PdfConverter.getInstance().convert(document, out, options);
While it works fine for most docx documents, when the document uses heading paragraph styles, the text overlaps when wrapping (the heading is forced to be displayed in one row)—rendering the pdf document practically unreadable. Versions are
fr.opensagres.poi.xwpf.converter.pdf-2.0.4
xdocreport version 2.0.4
poi 5.2.2.
Any help very much appreciated!
Source docx:
Resulting pdf:

Excel and Libre Office conflict over Open XML output

Open XML is generating .xlsx files that can be read by Open Office, but not by Excel itself.
With this as my starting point( Export DataTable to Excel with Open Xml SDK in c#) I have added code to create a .xlsx file. Attempting to open with Excel, I'm asked if I want to repair the file. Saying yes gets "The workbook cannot be opened or repaired by Microsoft Excel because it's corrupt." After many hours of trying to jiggle the data from my table to make this work, I finally threw up my hands in despair and made a spreadsheet with a single number in the first cell.
Still corrupt.
Renaming it to .zip and exploring shows intact .xml files. On a whim, I took a legit .xlsx file created by Excel, unzipped it, rezipped without changing contents and renamed back to .xlsx. Excel declared it corrupt. So this is clearly not a content issue, but file a format issue. Giving up on Friday, I sent some of the sample files home and opened them there with Libre Office. There were no issues at all. File content was correct and Calc had no problem. I'm using Excel for Office 365, 32 bit.
// ignore the bits (var list) that get data from the database. I've reduced this to just the output of a single header line
List< ReportFilingHistoryModel> list = DB.Reports.Report.GetReportClientsFullHistoryFiltered<ReportFilingHistoryModel>(search, client, report, signature);
MemoryStream memStream = new MemoryStream();
using (SpreadsheetDocument workbook = SpreadsheetDocument.Create(memStream, SpreadsheetDocumentType.Workbook))
{
var workbookPart = workbook.AddWorkbookPart();
workbook.WorkbookPart.Workbook = new Workbook();
workbook.WorkbookPart.Workbook.Sheets = new Sheets();
var sheetPart = workbook.WorkbookPart.AddNewPart<WorksheetPart>();
var sheetData = new SheetData();
sheetPart.Worksheet = new Worksheet(sheetData);
Sheets sheets = workbook.WorkbookPart.Workbook.GetFirstChild<Sheets>();
string relationshipId = workbook.WorkbookPart.GetIdOfPart(sheetPart);
uint sheetId = 1;
if (sheets.Elements<Sheet>().Count() > 0)
{
sheetId = sheets.Elements<Sheet>().Select(s => s.SheetId.Value).Max() + 1;
}
Sheet sheet = new Sheet() { Id = relationshipId, SheetId = sheetId, Name = "History" };
sheets.Append(sheet);
Row headerRow = new Row();
foreach( var s in "Foo|Bar".Split('|'))
{
var cell = new Cell();
cell.DataType = CellValues.Number;
cell.CellValue = new CellValue("5");
headerRow.AppendChild(cell);
}
sheetData.AppendChild(headerRow);
}
memStream.Seek(0, SeekOrigin.Begin);
Guid result = DB.Reports.Report.AddClientHistoryList( "test.xlsx", memStream.GetBuffer(), "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
return Ok(result);
This should just work. I've noticed other stack overflow discussions that direct back to the first link I mentioned above. I seem to be doing it right (and Calc concurs). There have been discussions of shared strings and whatnot, but by using plain numbers I shouldn't be having issues. What am I missing here?

In working on this, I went with the notion that some extraneous junk on the end of a .zip file is harmless. 7-Zip, Windows Explorer and Libre Office all seem to agree (as does some other zip program I used at home whose name escapes me). Excel, however, does not. Using the pointer at memStream.GetBuffer() was fine, but using its length was not. (The preceding Seek() was unnecessary.) Limiting the write of the data to a length equal to the current output position keeps Excel from going off the rails.

Saving NSMutableAtrributed string with image attachments in core data

I am declaring an NSMutable Attributed String and appending different attributed strings to it. These attributed string may contain images as NSTextattachments.
var historyText : NSMutableAttributedString = NSMutableAttributedString(string: "")
let imageAttachment = NSTextAttachment()
imageAttachment.image = UIImage(named: "some_image")
imageAttachment.bounds = CGRect(x:0, y:-3.0, width: (imageAttachment.image?.size.width)!, height:(imageAttachment.image?.size.height)!)
let imageString = NSAttributedString(attachment: imageAttachment)
historyText.append(imageString)
I then save this attributed text in core data. But when i retrieve this attributed text from core data, the imageAttachment.bounds are lost.
Please suggest a way to preserve these bounds.
Thanks in advance.

Image saving and retrieving from database

I have to take an image from my system and save it in the form of bytes to a folder in the server.Than i have to give the path of the .txt file i.e the converted image file to the database by creating a table in it.Finally i want to retrieve it from the database.It should be a windows application.Is this possible?

Yes it is possible...
You are just storing the path of the image file that you created.
The path is just going to be a simple string.
While retrieving, you need to take the path from the database and set it as the image source path to the ImageBox in the windows application.
Example:
for selecting the image file.
string DestinationPath = "D:\\test.jpg";
OpenFileDialog ofd = new OpenFileDialog();
if (ofd.ShowDialog() == System.Windows.Forms.DialogResult.OK)
{
byte[] bt = File.ReadAllBytes(ofd.FileName);
File.WriteAllBytes(DestinationPath, bt);
}
//Store DestinationPath into the database.
for retrieving and displaying in a PictureBox
string pathFromDatabase = "D:\\test.jpg"; //Retrieve from database
pboxDisplay.Image = Image.FromFile(pathFromDatabase); //Assuming pboxDisplay as the PictureBox control
hope it helps...

try this
from data base
byte bt = File.ReadAllBytes("C:\\test.jpg");
File.WriteAllBytes("C:\\test1.jpg",bt)
" bt " you can upload this byte to Database while retrieving bytes from data base use File.WriteAllbytes ...

How to keep original rotate page in itextSharp (dll)

i would like create the project, reading from Excel and write on pdf and print this pdf.
From Excel file (from cell) read directory where is original pdf on computer or server, and next cell have info what write on the top in second pdf.
And problem is here, original pdf is horizontal, landscape, rotate and my program create copy from original pdf and write info from excel on the top on copy pdf file. But pdf which is landscape is rotate for 270 deegres. This is no OK. For portrait rotation working program OK, copy OK and write on the top of the copy is OK.
Where is my problem in my code.
Code:
public int urediPDF(string inTekst)
{
if (inTekst != "0")
{
string pisava_arialBD = #"..\debug\arial.ttf";
string oldFile = null;
string inText = null;
string indeks = null;
//razbitje stringa
string[] vhod = inTekst.Split('#');
oldFile = vhod[0];
inText = vhod[1];
indeks = vhod[2];
string newFile = #"c:\da\2";
//odpre bralnik pdf
PdfReader reader = new PdfReader(oldFile);
Rectangle size = reader.GetPageSizeWithRotation(reader.NumberOfPages);
Document document = new Document(size);
//odpre zapisovalnik pdf
FileStream fs = new FileStream(newFile + "-" + indeks + ".pdf", FileMode.Create, FileAccess.Write);
PdfWriter writer = PdfWriter.GetInstance(document, fs);
//document.Open();
document.OpenDocument();
label2.Text = ("Status: " + reader.GetPageRotation(reader.NumberOfPages).ToString());
//določi sejo ustvarjanje pdf
PdfContentByte cb = writer.DirectContent;
//izbira pisave oblike
BaseFont bf = BaseFont.CreateFont(pisava_arialBD, BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);
cb.SetColorFill(BaseColor.RED);
cb.SetFontAndSize(bf, 8);
//pisanje teksta v pdf
cb.BeginText();
string text = inText;
//izbira koordinat za zapis pravilnega teksta v pdf (720 stopinj roatacija (ležeče) in 90 stopinj (pokončno))
if (reader.GetPageRotation(1) == 720) //ležeča postavitev
{
cb.ShowTextAligned(1, text, 10, 450, 0);
cb.EndText();
}
else //pokončna postavitev
{
cb.ShowTextAligned(1, text + " - pokončen", 10, 750, 0);
cb.EndText();
}
// create the new page and add it to the pdf
PdfImportedPage page = writer.GetImportedPage(reader, reader.NumberOfPages);
cb.AddTemplate(page, 0, 0);
// close the streams and voilá the file should be changed :)
document.Close();
fs.Close();
writer.Close();
reader.Close();
}
else
{
label2.Text = "Status: Končano zapisovanje";
return 0;
}
return 0;
}
Picture fake pdf:

As explained many times before (ITextSharp include all pages from the input file, Itext pdf Merge : Document overflow outside pdf (Text truncated) page and not displaying, and so on), you should read chapter 6 of my book iText in Action (you can find the C# version of the examples here).
You are using a combination of Document, PdfWriter and PdfImportedPage to split a PDF. Please tell me who made you do it this way, so that I can curse the person who inspired you (because I've answered this question hundreds of times before, and I'm getting tired of repeating myself). These classes aren't a good choice for that job:
you lose all interactivity,
you need to rotate the content yourself if the page is in landscape,
you need to take the original page size into account,
...
Your problem is similar to this one itextsharp: unexpected elements on copied pages. Is there any reason why you didn't read the documentation? If you say: "I didn't have the time", please believe me if I say that I have almost 20 years of experience as a developer, and I've never seen "reading documentation" as a waste of time.
Long story short: read the documentation, replace PdfWriter with PdfCopy, replace AddTemplate() with AddPage().

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Store large hidden text/string to a PDF using iTextSharp - string

Related

Paragraph styles breaking formatting in poi pdf converter

Excel and Libre Office conflict over Open XML output

Saving NSMutableAtrributed string with image attachments in core data

Image saving and retrieving from database

How to keep original rotate page in itextSharp (dll)

Categories

Resources