How can i write text to pdf file - python-3.x

I'm using Python3 and I have a long text file and I would like to create a new pdf and write the text inside.
I tried using reportlab but it writes only one line.
from reportlab.pdfgen pdfgen import canvas
c = canvas.Canvas("hello.pdf")
c.drawString(100,750, text)
c.save()
I know that I can tell it in which line to write what. But is there a library where I can just give the text and the margins and it will write it in the pdf file ?
Thanks
EDIT:
Or instead of that I could also use a library that easily converts txt file to pdf file ?

Simply drawing your string on the canvas won't do your job.
If its just raw text and you don't need to do any modifications like heading and other kinds of stuff to your text, then you can simply put your text in Flowables i.e Paragraph, and your Flowables can be appended to your story[].
You can adjust the margins according to your use.
from reportlab.platypus import SimpleDocTemplate, Paragraph
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.lib.units import inch
from reportlab.lib.pagesizes import letter
styles = getSampleStyleSheet()
styleN = styles['Normal']
styleH = styles['Heading1']
story = []
pdf_name = 'your_pdf_file.pdf'
doc = SimpleDocTemplate(
pdf_name,
pagesize=letter,
bottomMargin=.4 * inch,
topMargin=.6 * inch,
rightMargin=.8 * inch,
leftMargin=.8 * inch)
with open("your_text_file.txt", "r") as txt_file:
text_content = txt_file.read()
P = Paragraph(text_content, styleN)
story.append(P)
doc.build(
story,
)
For more information on Flowables read reportlab-userguide

Related

How to save all figures in pdf file in python created from seaborn style & dataframe?

This code gives me output of grid as 1 with style background.
def plot(grid):
cmap = sns.light_palette("red", as_cmap=True)
figure = pd.DataFrame(grid)
figure = figure.style.background_gradient(cmap=cmap, axis=None)
display(figure)
I wanted to store multiples images such as 1 in a single pdf file generated by Fun 'plot'.In case of matplotlib
from matplotlib.backends.backend_pdf import PdfFile,PdfPages
pdfFile = PdfPages("name.pdf")
pdfFile.savefig(plot)
pdfFile.close()
can do this. but for this case I am facing issues because it is dataframe or I am using searborn background_style.
could you please suggest to store output of above in single pdf file or png or jpg.
Here is my code to save all open figures to a pdf, it saves each plot to a separate page in the pdf.
from matplotlib.backends.backend_pdf import PdfPages
pp = PdfPages('C:\path\filename.pdf') #path to where you want to save the pdf
figNum = plt.get_fignums() #creates a list of all figure numbers
for i in range(len(figNum)): #loop to add each figure to pdf
pp.savefig(figNum[i]) #uses the figure number in the list to save it to the pdf
pp.close() #closes the opened file in memory
We can creat folder name 'image' and store all images of code output in png format.we will have to use dataframe image for that.
import dataframe_image as dfi
from PIL import Image
def plot(grid):
cmap = sns.light_palette("red", as_cmap=True)
figure = pd.DataFrame(grid)
figure = figure.style.background_gradient(cmap=cmap, axis=None)
dfi.export(figure, f'image\df_styled.png, max_cols=-1)

Python 3: write Russian text to PDF file

The problem was to write Russian text to PDF file. I have tried several encodings, however, this didn't solve the problem. You can find the solution I came up with in answer section. Please, note that write_to_file function writes text only on one page. It was not tested for larger files.
Here is a solution. I am using reportlab version 3.5.42.
from reportlab.lib.units import cm
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont
from reportlab.lib.styles import ParagraphStyle
from reportlab.platypus import Paragraph, Frame
from reportlab.graphics.shapes import Drawing, Line
from reportlab.pdfgen.canvas import Canvas
def write_to_file(filename, story):
"""
(str, list) -> None
Write text from list of strings story to filename. filename should be in format name.pdf.
Russian text is supported by font DejaVuSerif. DejaVuSerif.ttf should be saved in the working directory.
filename is stored in the same working directory.
"""
canvas = Canvas(filename)
pdfmetrics.registerFont(TTFont('DejaVuSerif', 'DejaVuSerif.ttf'))
# Various styles option are available, consult reportlab User Guide
style = ParagraphStyle('russian_text')
style.fontName = 'DejaVuSerif'
style.leading = 0.5*cm
# Using XML format for new line character
for i, part in enumerate(story):
story[i] = Paragraph(part.replace('\n', '<br></br>'), style)
# Create a frame to make the text fit to the page, A4 format is used by default
frame = Frame(0, 0, 21*cm, 29.7*cm, leftPadding=cm, bottomPadding=cm, rightPadding=cm, topPadding=cm,)
# Add different parts of the story
frame.addFromList(story, canvas)
canvas.save()

How to fix 'ValueError("input must have more than one sentence")' Error

Im writing a script that takes a website url and downloads it using beautiful soup. It then uses gensim.summarization to summarize the text but I keep getting ValueError("input must have more than one sentence") even thought the text has more than one sentence. The first section of the script works that downloads the text but I cant get the second part to summarize the text.
import bs4 as bs
import urllib.request
from gensim.summarization import summarize
from gensim.summarization.textcleaner import split_sentences
#===========================================
print("(Insert URL)")
url = input()
sauce = urllib.request.urlopen(url).read()
soup = bs.BeautifulSoup(sauce,'lxml')
#===========================================
print(soup.title.string)
with open (soup.title.string + '.txt', 'wb') as file:
for paragraph in soup.find_all('p'):
text = paragraph.text.replace('.', '.\n')
text = split_sentences(text)
text = summarize(str(text))
text = text.encode('utf-8', 'ignore')
#===========================================
file.write(text+'\n\n'.encode('utf-8'))
It should create a .txt file with the summarized text in it after the script is run in whatever folder the .py file is located
You should not use split_sentences() before passing the text to summarize() since summarize() takes a string (with multiple sentences) as input.
In your code you are first turning your text into a list of sentences (using split_sentences()) and then converting that back to a string (with str()). The result of this is a string like "['First sentence', 'Second sentence']". It doesn't make sense to pass this on to summarize().
Instead you should simply pass your raw text as input:
text = summarize(text)

Adding Logo in Header of Word document using python-docx

I want a logo file to be attached everytime in the word document, when I run the code,
Ideally the code should look like :
from docx import Document
document = Document()
logo = open('logo.eps', 'r') #the logo path that is to be attached
document.add_heading('Underground Heating Oil Tank Search Report', 0) #simple heading that will come bellow the logo in the header.
document.save('report for xyz.docx') #saving the file
is this possible in the python-docx or should i try some other library to do this? if possible please tell me how,
with the following code, you can create a table with two columns the first element is the logo, the second element is the text part of the header
from docx import Document
from docx.shared import Inches, Pt
from docx.enum.text import WD_ALIGN_PARAGRAPH
document = Document()
header = document.sections[0].header
htable=header.add_table(1, 2, Inches(6))
htab_cells=htable.rows[0].cells
ht0=htab_cells[0].add_paragraph()
kh=ht0.add_run()
kh.add_picture('logo.png', width=Inches(1))
ht1=htab_cells[1].add_paragraph('put your header text here')
ht1.alignment = WD_ALIGN_PARAGRAPH.RIGHT
document.save('yourdoc.docx')
A simpler way to include logo and a header with some style (Heading 2 Char here):
from docx import Document
from docx.shared import Inches, Pt
doc = Document()
header = doc.sections[0].header
paragraph = header.paragraphs[0]
logo_run = paragraph.add_run()
logo_run.add_picture("logo.png", width=Inches(1))
text_run = paragraph.add_run()
text_run.text = '\t' + "My Awesome Header" # For center align of text
text_run.style = "Heading 2 Char"

Error when importing text from text file to pdf (reportlab, python3)

I have this file: hello.txt, and this file have this text: Hello World and this is my code:
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import A4
c = canvas.Canvas("file.pdf", pagesize=A4)
hello=open('files/hello.txt', encoding="utf-8").read()
c.drawString(10, 800, str(hello))
c.save()
But text inside of file.pdf looks like this:
So, what is that black box which appears after an text? How can I remove it? Is there a better way to import text from text files to pdf with reportlab?
Try using this one-liner on your string:
your_text = ''.join([x for x in your_text if x in string.printable])
This issue occurs because of non-printable characters.

Resources