Cannot convert HTML (http://www.google.co.in/) to PDF using iTextSharp

Cannot convert HTML (http://www.google.co.in/) to PDF using iTextSharp - c#-4.0

I am using iTextSharp to convert a HTML (source site is google: http://www.google.co.in/) to PDF.
My code :
protected void Page_Load(object sender, EventArgs e)
{
WebClient wc = new WebClient();
string HTMLCode = wc.DownloadString("http://www.google.co.in/");
var result = createPDF(HTMLCode);
}
private MemoryStream createPDF(string html)
{
MemoryStream msOutput = new MemoryStream();
TextReader reader = new StringReader(html);
// step 1: creation of a document-object
Document document = new Document(PageSize.A4, 30, 30, 30, 30);
// step 2:
// we create a writer that listens to the document
// and directs a XML-stream to a file
PdfWriter writer = PdfWriter.GetInstance(document, msOutput);
// step 3: we create a worker parse the document
HTMLWorker worker = new HTMLWorker(document);
// step 4: we open document and start the worker on the document
document.Open();
worker.StartDocument();
// step 5: parse the html into the document
worker.Parse(reader);
// step 6: close the document and the worker
worker.EndDocument();
worker.Close();
document.Close();
return msOutput;
}
I have referred the createPDF function from here.
But I am encountering the below error
Unable to cast object of type 'iTextSharp.text.html.simpleparser.CellWrapper' to type 'iTextSharp.text.Paragraph'.
Is it some problem with iTextSharp library? By the way I am using itextsharp-dll-core-5.3.0

No one listens!
HTMLWorker
is obsolete.
won't be maintained.
it's replaced with XML-Worker.

Related

OpenXml Cannot Read or Write To Azure Blob Storage with FileMode or FileAccess value is invalid for the stream

We are working with DocumentFormat.OpenXml.WordDocument and opening a template of a Wordfile inside the Azure Blob, writing to it and saving it. There are two issues:
During the opening of the Document
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(streamToWrite, true))
Here the issue is like this ->
DocumentFormat.OpenXml.Packaging.OpenXmlPackageException: 'The stream was not opened for writing.'
If I replaces the IsEditable=trueto false, the issue will go. But the another issue occurs subsequently during the reading of the stream, after the changing the texts.
// Writing the changed document using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.OpenOrCreate)))
The Error is like this-> System.ArgumentException: 'Stream was not writable.'
Final Code is like this, which is referred from this link : -https://learn.microsoft.com/en-us/office/open-xml/how-to-search-and-replace-text-in-a-document-part
public async Task<IActionResult> UpdateDocumentAsync([FromBody] ProcessingQuery query)
{
BlobClient readblob = new BlobClient(new Uri("https://XXXXXX/docxviewer/outputviewer/blob.docx?sp=racwd"));
using (HttpClient client = new HttpClient())
{
using (Stream streamToWrite = await client.GetStreamAsync(readblob.Uri))
{
try
{
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(streamToWrite, false))
{
string docText = null;
using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
{
docText = sr.ReadToEnd();
}
//Writing of Single Queries this will become dictionary key-value pair
Regex regexText1 = new Regex("{first_name}");
docText = regexText1.Replace(docText, query?.first_Name);
Regex regexText2 = new Regex("{last_name}");
docText = regexText2.Replace(docText, query?.last_Name);
Regex regexText3 = new Regex("{phone}");
docText = regexText2.Replace(docText, query?.phone);
// Writing the changed document
using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.OpenOrCreate)))
{
sw.Write(docText);
sw.Flush();
sw.Close();
}
wordDoc.Close();
}
}
catch (Exception ex)
{
return BadRequest(ex.Message);
}
streamToWrite.Close();
}
}
return Ok(readblob);
}
Please help us on this codes with relevant codes and documents. If there is any issue in blob storage token Kindly help us in that too.. :)

Poi4Xpages - write file to document during postGeneration

as per a suggestion recently received from Stwissel, I am using the following code to convert a file to mime and attach to a notes document during the post generation process of POI4Xpages.
EDITED NEW CODE:
The following code attaches to document, but throws an error: 502 Bad Gateway - The server returned an invalid or incomplete response.
public void saveExcel(Workbook a, Document newDoc) throws NotesException, IOException{
newDoc.replaceItemValue("Form", "Provider");
// Create the stream
Session session = DominoUtils.getCurrentSession();
Stream stream = session.createStream();
// Write the workbook to a ByteArrayOutputStream
ByteArrayOutputStream bos = new ByteArrayOutputStream();
a.write(bos);
// Convert the output stream to an input stream
InputStream is = new ByteArrayInputStream(bos.toByteArray());
stream.setContents(is);
MIMEEntity m = newDoc.createMIMEEntity("body");
MIMEHeader header = m.createHeader("content-disposition");
header.setHeaderVal("Mime attachment");
m.setContentFromBytes(stream, "application/vnd.ms-excel", MIMEEntity.ENC_IDENTITY_BINARY);
m.decodeContent();
newDoc.save(true, true);
}
ORIGINAL CODE:
var stream:NotesStream = session.createStream();
// Do not automatically convert MIME to rich text
session.setConvertMIME(false);
var doc:NotesDocument = database.createDocument();
doc.replaceItemValue("Form", "Provider");
var body:NotesMIMEEntity = doc.createMIMEEntity();
var header:NotesMIMEHeader = body.createHeader("Subject");
header.setHeaderVal("MIME attachment");
if (stream.open("c:\\notes\\data\\abc.xlsx", "binary")) {
if (stream.getBytes() != 0) {
body.setContentFromBytes(stream, "application/vnd.ms-excel",
NotesMIMEEntity.ENC_IDENTITY_BINARY);
} else requestScope.status = "File was not found.";
} else requestScope.status = "Could not open file.";
stream.close();
doc.save(true, true);
// Restore conversion
session.setConvertMIME(true);
However, this code is only attaching a file which is already stored on the server's local directory. How can I get this code to take the POI fileOutputStream and attach that?

It's a mix of what what Knut and I've commented about. The important part is that you'll use the write() method to pass the workbook data to an output stream.
// Create the stream
Stream stream = session.createStream();
// Write the workbook (you haven't clarified) to a ByteArrayOutputStream
ByteArrayOutputStream bos = new ByteArrayOutputStream();
workbook.write(bos);
// Convert the output stream to an input stream
InputStream is = new ByteArrayInputStream(bos.toByteArray());
stream.setContents(is);

Downloading bulk files from sharepoint library

I want to download the files from a sharepoint document library through code as there are thousand of files in the document library.
I am thinking of creating console application, which I will run on sharepoint server and download files. Is this approach correct or, there is some other efficient way to do this.
Any help with code will be highly appreciated.

Like SigarDave said, it's perfectly possible to achieve this without writing a single line of code. But if you really want to code the solution for this, it's something like:
static void Main(string[] args)
{
// Change to the URL of your site
using (var site = new SPSite("http://MySite"))
using (var web = site.OpenWeb())
{
var list = web.Lists["MyDocumentLibrary"]; // Get the library
foreach (SPListItem item in list.Items)
{
if (item.File != null)
{
// Concat strings to get the absolute URL
// to pass to an WebClient object.
var fileUrl = string.Format("{0}/{1}", site.Url, item.File.Url);
var result = DownloadFile(fileUrl, "C:\\FilesFromMyLibrary\\", item.File.Name);
Console.WriteLine(result ? "Downloaded \"{0}\"" : "Error on \"{0}\"", item.File.Name);
}
}
}
Console.ReadKey();
}
private static bool DownloadFile(string url, string dest, string fileName)
{
var client = new WebClient();
// Change the credentials to the user that has the necessary permissions on the
// library
client.Credentials = new NetworkCredential("Username", "Password", "Domain");
var bytes = client.DownloadData(url);
try
{
using (var file = File.Create(dest + fileName))
{
file.Write(bytes, 0, bytes.Length); // Write file to disk
return true;
}
}
catch (Exception)
{
return false;
}
}

another way without using any scripts is by opening the document library using IE then in the ribbon you can click on Open in File Explorer where you can then drag and drop the files right on your desktop!

Merge memorystreams to one iText document

I have four MemoryStreams of data that I want to merge and then open the pdfDocument, without creating a single file.
It's possible to write them down to files and then merge them but that would be bad practice and that can also cause a few issues so I want to avoid that.
However, I can not find a way to merge the MemoryStreams with iText5 for .NET.
Right now, this is how I do it with files:
private static void ConcatenateDocuments()
{
var stream = new MemoryStream();
var readerFrontPage = new PdfReader(Folder + FrontPageName);
var readerDocA = new PdfReader(Folder + docA);
var readerDocB = new PdfReader(Folder + DocB);
var readerAppendix = new PdfReader(Folder + Appendix);
var pdfCopyFields = new PdfCopyFields(stream);
pdfCopyFields.AddDocument(readerFrontPage);
pdfCopyFields.AddDocument(readerDocA );
pdfCopyFields.AddDocument(readerDocB);
pdfCopyFields.AddDocument(readerAppendix);
pdfCopyFields.Close();
SavePdf(stream, FilenameReport);
}
Since I need to remove the use of files, I keep the MemoryStream's as the different parts are built from different resources. So I have references to these memorystreams.
How can this be done?

The error PDF header signature not found can be fixed in this case by setting the stream's Position back to 0. Since you're not getting the error Cannot access a closed Stream I'm assuming that you are already correctly setting the PdfWriter's CloseStream to false.
Below is a full working C# 2010 WinForm app targeting iTextSharp 5.1.1.0 that creates three PDFs in MemoryStreams and combines them. Since I don't have a web server handy I'm writing them to disk.
using System;
using System.Text;
using System.Windows.Forms;
using System.IO;
using iTextSharp.text;
using iTextSharp.text.pdf;
namespace WindowsFormsApplication1
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e)
{
//Create three MemoryStreams
MemoryStream[] streams = { CreateDoc("Page 1"), CreateDoc("Page 2"), CreateDoc("Page 3") };
//I don't have a web server handy so I'm going to write my final MemoryStream to a byte array and then to disk
byte[] bytes;
//Create our final combined MemoryStream
using (MemoryStream finalStream = new MemoryStream())
{
//Create our copy object
PdfCopyFields copy = new PdfCopyFields(finalStream);
//Loop through each MemoryStream
foreach (MemoryStream ms in streams)
{
//Reset the position back to zero
ms.Position = 0;
//Add it to the copy object
copy.AddDocument(new PdfReader(ms));
//Clean up
ms.Dispose();
}
//Close the copy object
copy.Close();
//Get the raw bytes to save to disk
bytes = finalStream.ToArray();
}
//Write out the file to the desktop
string outputFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Combined.pdf");
using (FileStream fs = new FileStream(outputFile, FileMode.Create, FileAccess.Write, FileShare.None))
{
fs.Write(bytes, 0, bytes.Length);
}
this.Close();
}
/// <summary>
/// Helper method to create temporary documents
/// </summary>
private MemoryStream CreateDoc(string name)
{
MemoryStream ms = new MemoryStream();
using (Document doc = new Document(PageSize.LETTER))
{
using (PdfWriter writer = PdfWriter.GetInstance(doc, ms))
{
writer.CloseStream = false;
doc.Open();
doc.Add(new Paragraph(name));
doc.Close();
}
}
return ms;
}
}
}

While it seams the PdfReader can not take the stream, the array of the stream works.
var readerFrontPage = new PdfReader(streamFrontPage.ToArray());

Dynamically add mergefields in existing docx-document

Is it possible to add mergefields to an existing .docx document without using interop, only handling with open SDK from CodeBehind?

Yes this is possible, I've created a little method below where you simply pass through the name you want to assign to the merge field and it creates it for you.
The code below is for creating a new document but it should be easy enough to use the method to append to an existing document, hope this helps you:
using System;
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
using (WordprocessingDocument package = WordprocessingDocument.Create("D:\\ManualMergeFields.docx", WordprocessingDocumentType.Document))
{
package.AddMainDocumentPart();
Paragraph nameMergeField = CreateMergeField("Name");
Paragraph surnameMergeField = CreateMergeField("Surname");
Body body = new Body();
body.Append(nameMergeField);
body.Append(surnameMergeField);
package.MainDocumentPart.Document = new Document(new Body(body));
}
}
static Paragraph CreateMergeField(string name)
{
if (!String.IsNullOrEmpty(name))
{
string instructionText = String.Format(" MERGEFIELD {0} \\* MERGEFORMAT", name);
SimpleField simpleField1 = new SimpleField() { Instruction = instructionText };
Run run1 = new Run();
RunProperties runProperties1 = new RunProperties();
NoProof noProof1 = new NoProof();
runProperties1.Append(noProof1);
Text text1 = new Text();
text1.Text = String.Format("«{0}»", name);
run1.Append(runProperties1);
run1.Append(text1);
simpleField1.Append(run1);
Paragraph paragraph = new Paragraph();
paragraph.Append(new OpenXmlElement[] { simpleField1 });
return paragraph;
}
else return null;
}
}
}
You can download the Open Xml Productivity Tool from this url(if you do not already have it)http://www.microsoft.com/download/en/details.aspx?id=5124
This tool has a "Reflect Code" functionality.So you can manually create a merge field in an MS Word document and then open up the document with the Productivity Tool
and see a C# code sample on how to do this in code!It's very effective an I've used this exact tool to create the sample above.Good luck

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Cannot convert HTML (http://www.google.co.in/) to PDF using iTextSharp - c#-4.0

No one listens! HTMLWorker is obsolete. won't be maintained. it's replaced with XML-Worker.

Related

OpenXml Cannot Read or Write To Azure Blob Storage with FileMode or FileAccess value is invalid for the stream

Poi4Xpages - write file to document during postGeneration

Downloading bulk files from sharepoint library

Merge memorystreams to one iText document

Dynamically add mergefields in existing docx-document

Categories

Resources