ReportLab - Metadata, CreationDate and ModificationDate - python-3.x

How can I change metadata fields, CreationDate and ModificationDate, when I create a pdf with Reportlab?

Take a look at where modification and creation dates are set:
D['ModDate'] = D["CreationDate"] = \
Date(ts=document._timeStamp,dateFormatter=self._dateFormatter)
# ...
return PDFDictionary(D).format(document)
Basically, metadata is a dictionary saved at the end of binary string, start of string is file contents (document).
Inside Reportlab the workflow you ask about can be:
create canvas
draw something on it
get document from canvas
create PDFDictionary with artificial mod and create dates
format document with PDFDictionary
save to file
Change metadata of pdf file with pypdf also attempts similar goal.

The ReportLab (currently 3.5) Canvas provides public methods, like Canvas.setAuthor(), to set the /Author, /Title, and other metadata fields (called "Internal File Annotations" in the docs, section 4.5).
However, there is no method for overriding the /CreationDate or /ModDate.
If you only need to change the formatting of the dates, you can simply use the Canvas.setDateFormatter() method.
The methods described above modify a PDFInfo object, as can be seen in the source, but this is part of a private PDFDocument (as in Canvas._doc.info).
If you really do need to override the dates, you could either hack into the private parts of the canvas, or just search the content of the resulting file object for /CreationDate (...) and /ModDate (...), and replace the value between brackets.
Here's a quick-and-dirty example that does just that:
import io
import re
from reportlab.pdfgen import canvas
# write a pdf in a file-like object
file_like_obj = io.BytesIO()
p = canvas.Canvas(file_like_obj)
# set some metadata
p.setAuthor('djvg')
# ... add some content here ...
p.save()
# replace the /CreationDate (similar for /ModDate )
pdf_bytes = file_like_obj.getvalue()
pdf_bytes = re.sub(b'/CreationDate (\w*)', b'/CreationDate (D:19700101010203+01)', pdf_bytes)
# write to actual file
with open('test.pdf', 'wb') as pdf:
pdf.write(pdf_bytes)
The example above just illustrates the principle. Obviously one could use fancy regular expressions with lookaround etc.
From the pdf spec:
Date values used in a PDF shall conform to a standard date format, which closely follows that of the international standard ASN.1 (Abstract Syntax Notation One), defined in ISO/IEC 8824. A date shall be a text string of the form
( D : YYYYMMDDHHmmSSOHH' mm )

Related

Convert ITAB to XSTRING and back

I need to save an itab as an xstring or something like this and save it in dbtab.
Later I need to gather this xstring from dbtab and convert it in the itab before with exactly the same input from before.
I tried a lot of fuba´s like:
SCMS_STRING_TO_XSTRING or SCMS_XSTRING_TO_BINARY but I didn´t find something to convert it back.
Does somebody have tried something like this before and have some samples for me ?
Unfortunately I didn´t find something on other blogs or else.
An easy solution to convert into an xstring:
CALL TRANSFORMATION id SOURCE root = it_table RESULT XML DATA(lv_xstring).
Back would be like:
CALL TRANSFORMATION id SOURCE XML lv_xstring RESULT root = it_table.
For more information, see the ABAP documentation about data serialization and deserialization by using the XSL Identity Transformation.
use
import ... from data buffer
and
export ... to data buffer
to (re)store any variable as xstring.
Or you can use
import|export ... from|to database ...
I did some methods to do this:
First I loop at the table and concatenate it into a string.
Then convert the string into an xstring.
LOOP AT IT_TABLE ASSIGNING FIELD-SYMBOL(<LS_TABLE>).
CONCATENATE LV_STRING <LS_TABLE> INTO LV_STRING SEPARATED BY CL_ABAP_CHAR_UTILITIES=>NEWLINE.
ENDLOOP.
CALL FUNCTION 'SCMS_STRING_TO_XSTRING'
EXPORTING
TEXT = IV_STRING
IMPORTING
BUFFER = LV_XSTRING.
Back would be like:
Convert xstring back to string
String into table
TRY.
CL_BCS_CONVERT=>XSTRING_TO_STRING(
EXPORTING
IV_XSTR = IV_XSTRING
IV_CP = 1100 " SAP character set identification
RECEIVING
RV_STRING = LV_STRING
).
CATCH CX_BCS.
ENDTRY.
SPLIT IV_STRING AT CL_ABAP_CHAR_UTILITIES=>NEWLINE INTO: TABLE <LT_TABLE> .
READ TABLE <LT_TABLE> ASSIGNING FIELD-SYMBOL(<LS_TABLE>) INDEX 1.
IF <LS_TABLE> IS INITIAL.
DELETE TABLE <LT_TABLE> FROM <LS_TABLE>.
ENDIF.

How can I change a variable in a .txt template, if I have more than 5000 different values for this variable in an excel table?

I have an excel table with a column called 'interface', and what I want to do is a code that pull each interface value : Port-channel47, Port-channel46,etc... and put it in my .txt template replacing the {interface} part that i have in my txt template.
the value I want to change in the .txt template is "{interface}"
I tried this code:
but I get lost when i want to pull the data.
Anyone can help me? thank you so much in advance
Use string template:
from string import Template
with open('template.txt') as fp:
template = Template(fp.read())
for interface, group in df.groupby('Interface'):
file_name = interface.replace('/', '_') + '_output.txt'
with open(file_name, 'w') as fp:
content = template.substitute(interface=interface)
fp.write(content)
When you iterate over a DataFrameGroupBy object, it returns a 2-tuple containing the key of the group, and all rows matching that key.
A second thing to worry about is that / is not a valid path under most OSes I know so you must replace it with some other character (I chose a _).

Fails to parse Hebrew text from pdf using iText 7 with .net

I am trying to read a PDF file with several pages, using iText 7 on a .NET CORE 2.1
The following is my code:
Rectangle rect = new Rectangle(0, 0, 1100, 1100);
LocationTextExtractionStrategy strategy = new LocationTextExtractionStrategy();
inputStr = PdfTextExtractor.GetTextFromPage(pdfDocument.GetPage(i), strategy);
inputStr gets the following string:
"\u0011\v\u000e\u0012\u0011\v\f)(*).=*%'\f*).5?5.5*.\a \u0011\u0002\u001b\u0001!\u0016\u0012\u001a!\u0001\u0015\u001a \u0014\n\u0015\u0017\u0001(\u001b)\u0001)\u0016\u001c*\u0012\u0001\u001d\u001a \u0016* \u0015\u0001\u0017\u0016\u001b\u001a(\n,\u0002>&\u00...
and in the Text Visualizer, it looks like that:
)(*).=*%'*).5?5.5*. !!
())* * (
,>&2*06) 2.-=9 )=&,

2..*0.5<.?
.110
)<1,3
  2.3*1>?)10/6
 (& >(*,1=0>>*1?

  2.63)&*,..*0.5
  206)&13'?*9*<
  *-5=0>
?*&..,?)..*0.5
it looks like I am unable to resolve the encoding or there is a specific, custom encoding at the PDF level I cannot read/parse.
Looking at the Document Properties, under Fonts it says the following:
Any ideas how can I parse the document correctly?
Thank you
Yaniv
Analysis of the shared files
file1_copyPasteWorks.pdf
The font definitions here have an invalid ToUnicode entry:
/ToUnicode/Identity-H
The ToUnicode value is specified as
A stream containing a CMap file that maps character codes to Unicode values
(ISO 32000-2, Table 119 — Entries in a Type 0 font dictionary)
Identity-H is a name, not a stream.
Nonetheless, Adobe Reader interprets this name, and for apparently any name starting with Identity- assumes the text encoding for the font to be UCS-2 (essentially UTF-16). As this indeed is the case for the character codes used in the document, copy&paste works, even if for the wrong reasons. (Without this ToUnicode value, Adobe Reader also returns nonsense.)
iText 7, on the other hand, for mapping to Unicode first follows the Encoding value with unexpected results.
Thus, in this case Adobe Reader arrives at a better result by interpreting meaning into an invalid piece of data (and without that also returns nonsense).
file2_copyPasteFails.pdf
The font definitions here have valid but incomplete ToUnicode maps which only contain entries for the used Western European characters but not for Hebrew ones. They don't have Encoding entries.
Both Adobe Reader and iText 7 here trust the ToUnicode map and, therefore, cannot map the Hebrew glyphs.
How to parse
file1_copyPasteWorks.pdf
In case of this file the "problem" is that iText 7 applies the Encoding map. Thus, for decoding the text one can temporarily replace the Encoding map with an identity map:
for (int i = 1; i <= pdfDocument.GetNumberOfPages(); i++)
{
PdfPage page = pdfDocument.GetPage(i);
PdfDictionary fontResources = page.GetResources().GetResource(PdfName.Font);
foreach (PdfObject font in fontResources.Values(true))
{
if (font is PdfDictionary fontDict)
fontDict.Put(PdfName.Encoding, PdfName.IdentityH);
}
string output = PdfTextExtractor.GetTextFromPage(page);
// ... process output ...
}
This code shows the Hebrew characters for your file 1.
file2_copyPasteFails.pdf
Here I don't have a quick work-around. You may want to analyze multiple PDFs of that kind. If they all encode the Hebrew characters the same way, you can create your own ToUnicode map from that and inject it into the fonts like above.

How to detect the cell format like General, Percentage, currency ... in xlrd

screenshot from Excel
I'm using xlrd python library, and I want to get information about the cell format like :General, Percentage, Currency, ...
From the library doc, I found the following
style_name_map = {}
This provides access via name to the extended format information for both built-in styles and user-defined
styles.
It maps name to (built_in, xf_index), where name is either the name of a user-defined style, or
the name of one of the built-in styles. Known built-in names are Normal, RowLevel_1 to RowLevel_7,
ColLevel_1 to ColLevel_7, Comma, Currency, Percent, “Comma [0]”, “Currency [0]”, Hyperlink, and
“Followed Hyperlink”.
but I could not find a way to use this information to know the format of the cell
You can do something like this:
xf = book.xf_list[cell.xf_index]
fmt_key = xf.format_key
fmt = book.format_map[fmt_key]
return fmt.format_str

In GeoDMS, how can I transform string coordinates to dpoint?

I have problems converting coordinates in string format to dpoint format in GeoDMS GUI version 7.177.
I'm trying to read the BAG (basisadministratie gemeenten, Dutch municipality administration, a giant geo file) into GeoDMS directly from the Kadaster. It's first been converted from .xml into .csv, then the shapes of the buildings have been transformed in a format seemingly the same as the Vesta format, e.g.:
{5:{249943.307,593511.272}{249948.555,593512.791}{249946.234,593520.809}{249940.987,593519.29}{249943.307,593511.272}}
I am able to read the transformed CSV file into GeoDMS, then also able to write it as strings to .dmsdata format for speed and load it from there into GeoDMS again. However, when wanting to transform the strings into coordinates, I get the error
DPoint Error: Cannot find operator for these arguments:
arg1 of type DataItem<String>
Possible cause: argument type mismatch. Check the types of the used arguments.
My GeoDMS code looks like
unit<uint32> altBag:
storageName = 'c:/zandbak/output/bagPND.fss'
, storageReadOnly = 'true'
, dialogType = 'map'
, dialogData = 'geometry'
{
attribute <string> pandGeometrie; // works and looks good
attribute <dpoint> geometry := dpoint(pandGeometrie); // doesn't work, error above
attribute <rdc> geometry2 := pandGeometrie[rdc]; // doesn't work either
}
Is there a way to do this? Or is string to dpoint (or another type of point) unsupported and should I transform the CSV to shape file first?
you can try this:
attribute<dpoint> Geometry(poly) := dpolygon(GeometryStr);
and if a specific projection is required:
attribute<rdc_meter> Geometry2(poly) := value(GeometryStr, rdc_meter);

Resources