iText keep all chunks in one line in PDF file

iText keep all chunks in one line in PDF file - text

I am writing a Java program to generate PDF file using iText.
I need to write the day, like "10th", with "st/nd/rd/th" superscript text just after the number. I create one chunk for the number and another chunk for the superscript text by calling setTextRise() function.
Then, I found that the number and the superscript text were split into two line. The number was placed at the end of line while the superscript text was placed at the beginning of the next line.
How can I keep these two chunk in one line?

Related

Preserving newlines (line feeds) in SAS when importing from excel

I am importing a dataset from excel using the built in import data wizard. However, when viewing the data in SAS, cells with newlines have all line feeds (alt+Enter) replaced with a period (.)
For example, in excel:
"Example text
with new line"
will be read in by SAS as:
"Example text.with new line"
Usually line feeds or carriage returns are replaced by spaces, where the hex code (if you format the text as hex) is 0A. When I convert the text in excel to hex in excel using a formula, the new line feeds also show up as 0A.
However, the hex code for the period in my text (what used to be a line return in excel) is 2E, rather than the expected 0A. This prevents me from differentiating them from normal full stops, which means there's no possible workaround. Has anyone else come across this issue? Is there an option to change/set the default line feed replacement character in SAS?
My import code (variables replaced with 'text' for simplicity) for reference:
data work.table;
length
text $ 50;
label
text = "Text"
format
text $CHAR50;
informat
text $CHAR50;
infile 'path/to/file'
lrecl=1000
encoding='LATIN9'
termstr=CRLF
dlm='7F'x
missover
dsd;
input
text $CHAR50;
run;

SAS Viewer will not render so called non-printables (characters <= '1F'x) and does not display carriage return characters as a line break.
Example:
Excel cell with two line breaks in the data value
Imported with
proc import datafile='sample.xlsx' out=work.have;
run;
and viewed in standard SAS data set viewer (viewtable) appear to have lost the new lines.
Rest assured they are still there.
proc print data=have;
var text / style = [fontsize=14pt];
format text $hex48.;
run;

I would not recommend using the Import Wizard; there are far better tools nowadays. EG's import wizard is unique in SAS tools in how it works, and really was meant only to supply a way for data analysts who were not programmers to quickly bring in data; it's not robust enough for production work.
In this case, what's happening is that SAS's method for reading the data in is very rudimentary. What it does is convert it to a delimited file, and it doesn't handle LF characters very cleanly there. Instead of keeping them, which would be possible but is riskier (remember, this has to work for any incoming file), what it does is convert those to periods.
You'll see that in the notes in the program it generates:
Some characters embedded within the spreadsheet data were
translated to alternative characters so as to avoid transmission
errors.
It's referring to the LF character in that case.
The only way to get around this that I'm aware of is to either:
Convert the file to CSV from Excel yourself, and then read it in
Use ACCESS to PC FILES (via PROC IMPORT, or the checkbox in the import wizard)
Either of those will allow you to read in your line feed characters.

How to transfer each line of a text file to Excel cell?

I need to transfer some pdf table content to Excel. I used the PyMuPDF module to be able to put the PDF content to a .txt file. In which it is easier to work with and I did it successfully.
As you can see in the .txt file I was able to transfer each column and row of the pdf. They are displayed sequentially.
- I need some way to read the txt strings sequentially so I can put each line of the txt into a .xlsx cell.
- Some way to setup triggers to start reading the document sequentially and lines to throw away.
Example: Start reading after a specific word, stop reading when some word is reached. Things like this. Because these documents have headers and unuseful information that are also transcript to the txt file. So I need to ignore some contents of the txt to gather only the useful information to put in the .xlsx cells.
*I'm using the xlrd library, I would like to know how I can work with things here. (optional)
I don't know if it is a problem, but when I use the count method to count the number of lines, it returned only 15 lines. The document has 568 lines in total. It only showed the full empty ones.
with open(nome_arquivo_nota, 'r'):
for line in nome_arquivo_nota:
count += 1
print(count)
= 15 .

I have converted a PDF to text, I am now attempting to remove unwanted text to then join back together to create a csv

I have collect election results from a state board of elections. They did not have the data available via csv so I have a PDF. I was able to convert the PDF to text using open with and them I split the lines because the spacing was off. My next step is to remove text that I do not want. I only want the ED#, the candidate and how many votes they received so I can convert to csv for calculation. FYI this PDF is 490 pages long.

Insert Data With Carriage Returns

I have a command button that inserts data in to a worksheet from a userform, it then saves it as a .csv.
We then load the csv data using another userform - However, the problem arises when a Carriage Returns is inserted into a text box and inserted. Obviously the first solution I can think of is to stop commas being entered - however, is there a better solution?

#Miguel has the right idea (his comment is on the main question thread).
This approach involves defining a list of integers relating to the ASCII codes for the characters you're having trouble with. Line feed (10) will definitely be in there and you can decide on carriage return (13) commas (44) or speech marks (34).
const ListOfSpecialChars = "10,13,34,44"
You'll need an 'encoding' proc that accepts a string and outputs a string.
It would transform text in this kind of fashion:
ab,cd"ef -> ab<<44>>cd<<34>>ef
This would be achieved by splitting the const and looping through each of the constituents, executing a replace:
For Each splitCharVal In Split(ListOfSpecialChars, ",")
stringToEncode = Replace(stringToEncode, Chr(splitCharVal), "<<" & splitCharVal & ">>"
Next
You'll also need a 'decoding' proc that does the opposite, which I'll let you work out.
So, when saving a file to CSV, you'll need to loop through the cells of each row in turn, encoding the text found within, then writing out a row to the file.
When reading in a row from the encoded CSV, you'll need to run the decode operation prior to writing out the text to the worksheet.

Excel formula to output line breaks to word document

I have an Excel spreadsheet that is linked to a Word document.
This word document is a letter.
I have a field in Excel that is optional for the user to enter data into, but if something is present then it needs to be included into the letter.
I can do all that just fine, but this additional data needs to be in its own paragraph on the Word document. Hence some new lines need to be introduced to separate this new paragraph from the rest of the text, but also for these lines not to be present if the data isn't present.
On my Excel worksheet I have discovered this is how to do this:
=IF(B36="","",CHAR(10)&B36&CHAR(10))
However, on the word doc, this outputs as:
I do not want the " characters, but need the line breaks that it gives.
Entering the line breaks on the word doc beforehand is not an option, as I say, they need to NOT be present if this data is not present.
The output to Word is perfect if no data is present in the field. No line breaks and no " characters.

char(10) is the number for new line in Excel, char(11) is the number for new line in Word.
Was simply a case of replacing 10 with 11
=IF(B36="","",CHAR(11)&B36&CHAR(11))
Which gives the desired effect:

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

iText keep all chunks in one line in PDF file - text

Related

Preserving newlines (line feeds) in SAS when importing from excel

How to transfer each line of a text file to Excel cell?

I have converted a PDF to text, I am now attempting to remove unwanted text to then join back together to create a csv

Insert Data With Carriage Returns

Excel formula to output line breaks to word document

Categories

Resources