why junk code is appending while writing to text file in python - python-3.x

I have stored txt. file contents in a list like, given below then I am
trying to store it in new.txt file , file writing is working fine but
instead of text I am getting junk code , How can i get rid of this Please
help
ListOfLines =['ÿþGroup:Test\n', '\n', 'Fields:³TESTNO³TESTNUM³\n', '\n',
'³37³DUCK³DAFFY³³³³2']
fileName= open("path"+'//'+'TestFile+'.txt',"w+")
for item in ListOfLines :
x=(''.join([str(item)]))
fileName.write(x)
OutPut
片畯㩰䅐䥔久ൔഊ䘊敩摬㩳傳呁低䲳十乔䵁덅䥆乒䵁덅䥍乄䵁덅䥔䱔덅啓䙆塉傳呁䑉傳呁䥂呒덈䅐協塅亳䵁ㅅ傳呁䑉댱䍏啃䅐더䱈䰷䍏덋䅐䡔卉더䅐䍔䵏䆳䑄䕒卓䖳䡔䥎덃䥍剌

Set Encoding='UTF-8' before reading this file. It will work

Related

Howto handle umlauts in Logic App for export to csv

I created a logic app to export some data to a *.csv file.
Data which will be exported contains german umlauts.
I read all the needed values into variables which are then concatenated and added to an array.
Finally I get an array of semicolon separated strings with the values in it.
This result will then be added to an email as file attachment:
All the values are handled correctly in the Logic App and are correct in the *.csv file but as soon I open the csv with Excel, the umlauts are not shown correctly anymore.
Is there a way to create explicitly a file with the correct encoding within the logic app and add the file to the email instead of the ExportString?
Or can I somehow encode the content of the ExportString-Variable?
Any hints?
I have reproduced in my environment and followed below steps to get correct output in CSV file:
My input is:
I have sent the data into CSV table as below and then created a file in file share as below:
Then when i open my file share and download the content from there i got different output as you got:
Then I opened my Azure Storage explorer and downloaded it as below:
When i open in notepad the downloaded file:
I get the correct output, try to do in this way
And when i save it as hello.csv and keep utf-8 with bom like below:
Then I get the correct output in csv as well:

How to parse big XML file using beautiful soup?

I am trying to parse an XML file named document.xml which contains around 400000 character (including tags, breakline, space) init find the code below
document_xml_file_object = open('document.xml', 'r')
document_xml_file_content = document_xml_file_object.read()
xml_content = BeautifulSoup(document_xml_file_content, 'lxml-xml')
print("XML CONTENT: ", xml_content)
when I am printing xml_content below is my output:
XML CONTENT: <?xml version="1.0" encoding="utf-8"?>
For the smaller size of files its printing complete XML code. can anyone help me with this why its happening.
Edit : Click Here to see my XML Content.
Thanks in Advance
For large files it better to use line parser like xml.sax. beautifulsoup will load the whole file in memory and parse, while using xml.sax you will use quite less memory.

Writing a list of python console output to text file

I am a newb to python and working on writing a file from the list of devices printed to the python console. I am using pathlib and trying to write the entire list to a text file. I have tried many configurations but for some reason it only writes the first device or a blank text file.
Can anyone explain to me what I am doing wrong?
Thank you.
results = nb.dcim.devices.filter(q='xxxx')
# Print results to console
for device in results:
print(device.name)
# Write data to file
path = pathlib.Path("C:\\Temp\\device_inventory.txt")
path.write_text()

result shows file full of some symbols rather than text when I loop files

I was looping some files to copy the content of somes file to a new file but after I run the code, the result shows lot of symbols in the new file, not the text content of the files I looped.
first, when I ran the code without putting the 'encoding' attribute in open file line, it showed an error message like,
UnicodeEncodeError: 'charmap' codec can't encode character '\x8b' in position 12: character maps to .
I tried various encodings like utf-8,latin1 but nothing worked and when i put 'errors=ignore' in the open file line, then the result showed like I described above.
import os
import glob
folder = os.path.join('R:', os.sep, 'Files')
def notes():
for doc in glob.glob(folder + r'\*'):
if doc.endswith('.pdf'):
with open(doc,'r') as f:
x = f.readlines()
with open('doc1.text', 'w+') as f1:
for line in x:
f1.write(line)
notes()
If I understand your example correctly and you’re trying to read PDF files, your problem is not one of encoding but of file format. PDF files don’t just to store your text in coding materials are unique format that you need to be able to read in order to extract the text. There are a couple of python libraries that can read PDF files (such as Py2PDF), please refer to this thread for more information: How to extract text from a PDF file?

How can I delete some text from inside of a file?

I have a text file with the following contents:
3,3,5,7,9,10,1
I can read the file and reach the content by ReadToEnd() method, but I want to delete only '3', so that it has the contents:
5,7,9,10,10
How can I do this?
If you want to delete it from with in Notepad, try find & replacing 3, with and empty space. If you are doing it from within you program you have to parse the file and make a new one only appending it with the values you want

Resources