ITextSharp: parse html with cyrillic/international words

ITextSharp: parse html with cyrillic/international words - c#-4.0

I try to parse html file and to generate pdf. I use code
document.Open();
HtmlPipelineContext htmlContext = new HtmlPipelineContext(null);
htmlContext.SetTagFactory(Tags.GetHtmlTagProcessorFactory());
ICSSResolver cssResolver = XMLWorkerHelper.GetInstance().GetDefaultCssResolver(true);
IPipeline pipeline =
new CssResolverPipeline(cssResolver,
new HtmlPipeline(htmlContext,
new PdfWriterPipeline(document, writer)));
XMLWorker worker = new XMLWorker(pipeline, true);
XMLParser p = new XMLParser(true, worker, Encoding.Unicode);
p.Parse((TextReader)File.OpenText(#"Template.html"));
document.Close();
How can I define base font, If i'd like use cyrillic/international words?

You should register font
string arialuniTff = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Fonts), "ARIALUNI.TTF");
FontFactory.Register(arialuniTff);
and modifed page's body
<body face='Arial' encoding='koi8-r' >
...
</body >
For somebody, who can read in russian, this article can be useful

I propose the following variant
//connect the font
String FONT_LOCATION = Server.MapPath("~/fonts/arial.ttf");
BaseFont baseFont = BaseFont.CreateFont(FONT_LOCATION, BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);
iTextSharp.text.Font font = new iTextSharp.text.Font(baseFont, iTextSharp.text.Font.DEFAULTSIZE, iTextSharp.text.Font.NORMAL);
//connected
PdfPCell cell1 = new PdfPCell(new Phrase(lblN, font)) { HorizontalAlignment = 1, VerticalAlignment= 1 };

Related

SpecFlow+ Excel Generate Feature File Programmatically

Given an Excel file, how could a feature file is generated programmatically ?
Using Specflow 2.3.2 and corresponding Excel plugin and dotNet Framework (NOT dotNetCore)
string excelPath = #".\CalculatorAdd.feature.xlsx";
ExcelParser excelParser = new ExcelParser( CultureInfo.CurrentCulture, CultureInfo.CurrentCulture );
var specFlowDoc = excelParser.ParseExcel(excelPath);
// all the scenarios are in this specFlowDoc,
// just need to export it as feature file, no need to generate code-behind.
// what is next ??

FeatureFile is generated using
var docFormatter = new SpecFlow.Plus.Excel.SpecFlowPlugin.DocumentFormatter();
string featureFileStr = docFormatter.GetDocumentText(specFlowDoc);
Full Code:
string excelPath = #".\CalculatorAdd.feature.xlsx";
ExcelParser excelParser = new SpecFlow.Plus.Excel.SpecFlowPlugin.ExcelParser( CultureInfo.CurrentCulture, CultureInfo.CurrentCulture );
var specFlowDoc = excelParser.ParseExcel(excelPath);
var docFormatter = new SpecFlow.Plus.Excel.SpecFlowPlugin.DocumentFormatter();
string featureFileStr = docFormatter.GetDocumentText( specFlowDoc );

Search an object, a image or rectangle and add image on it in java with iText

I'm using Itext to add an image to a PDF File. I added the image but I don't want it in an absolute position because I have lots of different reports. I think to put an image in the pdf and change it but it didn't work for me.
That's my code now with absolute position:
ByteSource pdfVersion = remoteAccess.loadMultimedia(signature.getIdMultimedia());
// Imagen
Files.createDirectories(Paths.get("src/main/resources/signFolder"));
File img = new File("src/main/resources/signFolder/" + fileName);
img.deleteOnExit();
com.google.common.io.Files.write(file.read(), img);
// Pdf
File pdf = new File("src/main/resources/signFolder/" + signature.getIdMultimedia());
com.google.common.io.Files.write(pdfVersion.read(), pdf);
PdfReader reader = new PdfReader(pdf.getPath());
PdfStamper stamper = new PdfStamper(reader,
new FileOutputStream("src/main/resources/signFolder/with_image.pdf"));
Image image = Image.getInstance(img.getPath());
PdfImage stream = new PdfImage(image, "", null);
stream.put(new PdfName("ITXT_SpecialId"), new PdfName("123456789"));
PdfIndirectObject ref = stamper.getWriter().addToBody(stream);
image.setDirectReference(ref.getIndirectReference());
image.setAbsolutePosition(36, 270);
image.scalePercent(50);
PdfContentByte over = stamper.getOverContent(1);
over.addImage(image);
stamper.close();
reader.close();
I thought to search a rectangle in the PDF file. Do you know how? Thanks in advance.

Replacing tag content with html string

I have the following xml:
<foo><toReplace/></foo>
I want to replace <toReplace/> tag with the following string:
"<b>bar</b>"
How can I do that?
Right now I have the following code:
var xml = "<foo><toReplace/></foo>";
var parser = new dom.DOMParser().parseFromString(xml, "text/xml");
parser.getElementsByTagName("toReplacce")[0].textNode = "<b>bar</b>";
console.log(parser.toString()); // "<foo><b>bar</b>"
The problem is that is escapes HTML. How can I replace the content with the HTML string here?

you can always use the module from npm
var unescape = require('unescape');
console.log(unescape(parser.toString()))
When I tested your code there is a small typo: (toReplacce instead of toReplace)
var dom = require('xmldom');
var xml = "<foo><toReplace/></foo>";
var parser = new dom.DOMParser().parseFromString(xml, "text/xml");
var a = parser.getElementsByTagName("toReplace")[0];
//console.dir(a);
a.textvalue = "<b>bar</b>";
console.log(parser.toString());

function for getting text from cursor position in CKEditor acting strange

I'm trying to extract all text starting from the cursor position. Here is the code that I'm using:
originalText = editor.getData();
var startTag = "<span id=\x22Start\x22> </span>";
var stopTag = "<span id=\x22Stop\x22> </span>";
var startElement = CKEDITOR.dom.element.createFromHtml( startTag, editor.document );
editor.insertElement(startElement);
sText = editor.getData();
sText1 = sText + stopTag;
editor.setData(sText1);
// up to here, I've incapsulated the required text with span tags
// Using the replace function, I remove end tag of the Start span as well as removing the start tag of the Stop span!
sText1 = editor.getData();
sText2 = sText1.replace("<span id=\"Start\"> </span>", "<span id=\"Start\">");
sText2 = sText2.replace("<span id=\"Stop\"> </span>", "</span>");
// I set the data (HTML) back to the editor
editor.setData(sText2);
//alert(sText2);
// I use the innerHTML to get the text
el = editor.document.$.getElementById("Start");
return el.innerHTML;
The problem:
The el.innerHTML is working fine BUT ONLY if the alert() is uncommented! I know that setData is asynchronous and by using the callback on setData() would solve the problem but unfortunately it's not working for me :(

The quick fix to try would indeed be by using the callback. Using a callback you cannot return the el.innerHTML though, you would have to call an external function. You did not show the code that calls this code, so it's very hard to determine how to refactor this code to the way you want it. Below is a dummy version that is NOT tested and just serves to give you an example on how callbacks might be used.
function extractTextStartingFromCursorPosition(yourCallback) {
originalText = editor.getData();
var startTag = "<span id=\x22Start\x22> </span>";
var stopTag = "<span id=\x22Stop\x22> </span>";
var startElement = CKEDITOR.dom.element.createFromHtml( startTag, editor.document );
editor.insertElement(startElement);
sText = editor.getData();
sText1 = sText + stopTag;
editor.setData(sText1, function() {
// up to here, I've incapsulated the required text with span tags
// Using the replace function, I remove end tag of the Start span as well as removing the start tag of the Stop span!
sText1 = editor.getData();
sText2 = sText1.replace("<span id=\"Start\"> </span>", "<span id=\"Start\">");
sText2 = sText2.replace("<span id=\"Stop\"> </span>", "</span>");
// I set the data (HTML) back to the editor
editor.setData(sText2, function() {
// I use the innerHTML to get the text
el = editor.document.$.getElementById("Start");
yourCallback(el.innerHTML);
});
});
}
So, if previously you used the function like this:
var html = extractTextStartingFromCursorPosition();
doSomethingWithHtml(html);
Now you would use it like this:
extractTextStartingFromCursorPosition(function(html) {
doSomethingWithHtml(html);
});
Although I still feel that this functionality might be better done in some other way, but I don't have time to test/refactor a new solution for the same requirements.

How to parse taggedword using stanford NLP

I have a list of tagged sentences stored in txt file in the following format:
We_PRP 've_VBP just_RB wrapped_VBN up_RP with_IN the_DT boys_NNS of_IN Block_NNP B_NNP
Now I want parse the sentence, I found the following code:
String filename = "tt.txt";
// This option shows loading and sentence-segmenting and tokenizing
// a file using DocumentPreprocessor.
TreebankLanguagePack tlp = new PennTreebankLanguagePack();
GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
// You could also create a tokenizer here (as below) and pass it
// to DocumentPreprocessor
for (List<HasWord> sentence : new DocumentPreprocessor(filename)) {
Tree parse = lp.apply(sentence);
parse.pennPrint();
System.out.println();
GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
Collection tdl = gs.typedDependenciesCCprocessed();
System.out.println(tdl);
System.out.println();
}
The parse result is long, and I wondered the problem lay on this line new DocumentPreprocessor(filename) it actually retag my sentence, any way to skip the tagging step?

You can find the answer in the Parser FAQ, I tried and it works for me
// set up grammar and options as appropriate
LexicalizedParser lp = LexicalizedParser.loadModel(grammar, options);
String[] sent3 = { "It", "can", "can", "it", "." };
// Parser gets tag of second "can" wrong without help
String[] tag3 = { "PRP", "MD", "VB", "PRP", "." };
List sentence3 = new ArrayList();
for (int i = 0; i < sent3.length; i++) {
sentence3.add(new TaggedWord(sent3[i], tag3[i]));
}
Tree parse = lp.parse(sentence3);
parse.pennPrint();

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

ITextSharp: parse html with cyrillic/international words - c#-4.0

Related

SpecFlow+ Excel Generate Feature File Programmatically

Search an object, a image or rectangle and add image on it in java with iText

Replacing tag content with html string

function for getting text from cursor position in CKEditor acting strange

How to parse taggedword using stanford NLP

Categories

Resources