I have a ".msg" file, and I wanted to do some textual parsing on it. I have not installed any package to convert a .msg file to any other kind of format.
When I do :
cat testing.msg <--- Shows up correctly and whatever in the screen is what I want
However when I do:
cat testing.msg > file <---- The file seems to be encoded when I do vi and look into the file.
testing.msg: CDF V2 Document, No summary info
How can I correctly to read the msg file so that I get the correct textual data?
I have tried changing the $LANG variables but that doesn't seem to work.
Related
I created a logic app to export some data to a *.csv file.
Data which will be exported contains german umlauts.
I read all the needed values into variables which are then concatenated and added to an array.
Finally I get an array of semicolon separated strings with the values in it.
This result will then be added to an email as file attachment:
All the values are handled correctly in the Logic App and are correct in the *.csv file but as soon I open the csv with Excel, the umlauts are not shown correctly anymore.
Is there a way to create explicitly a file with the correct encoding within the logic app and add the file to the email instead of the ExportString?
Or can I somehow encode the content of the ExportString-Variable?
Any hints?
I have reproduced in my environment and followed below steps to get correct output in CSV file:
My input is:
I have sent the data into CSV table as below and then created a file in file share as below:
Then when i open my file share and download the content from there i got different output as you got:
Then I opened my Azure Storage explorer and downloaded it as below:
When i open in notepad the downloaded file:
I get the correct output, try to do in this way
And when i save it as hello.csv and keep utf-8 with bom like below:
Then I get the correct output in csv as well:
I have a compressed file that's about 200 MB, in the form of a tar.gz file. I understand that I can extract the xml files in it. It contains several small and one 5 GB xml file. I'm trying to remove certain characters from the xml files.
So my very basic question is: is it even possible to accomplish this without ever extracting the content of the compressed file?
I'm trying to speed up the process of reading through xml files looking for characters to remove.
You will have to decompress, change, and then recompress the files. There's no way around that.
However, this does not necessarily include writing the file to a storage. You might be able to do the changes you like in a streaming fashion, i.e. that everything is only done in memory without ever having the complete decompressed file somewhere. Unix uses pipes for such tasks.
Here is an example on how to do it:
Create two random files:
echo "hello world" > a
echo "hello world" > b
Create a compressed archive containing both:
tar -c -z -f x.tgz a b
Pipe the contents of the uncompressed archive through a changer. Unfortunately I haven't found any shell-based way to do this but you also specified Python in the tags, and with the tarfile module you can achieve this:
Here is the file tar.py:
#!/usr/bin/env python3
import sys
import tarfile
tar_in = tarfile.open(fileobj=sys.stdin.buffer, mode='r:gz')
tar_out = tarfile.open(fileobj=sys.stdout.buffer, mode='w:gz')
for tar_info in tar_in:
reader = tar_in.extractfile(tar_info)
if tar_info.path == 'a': # my example file names are "a" and "b"
# now comes the code which makes our change:
# we just skip the first two bytes in each file:
reader.read(2) # skip two bytes
tar_info.size -= 2 # reduce size in info object as well
# add the (maybe changed) file to the output:
tar_out.addfile(tar_info, reader)
tar_out.close()
tar_in.close()
This can be called like this:
./tar.py < x.tgz > y.tgz
y.tgz will contain both files again, but in a the first two bytes will be skipped (so its contents will be llo world).
You will have noticed that you need to know the resulting size of your change beforehand. tar is designed to handle files, and so it needs to write the size of the entry files into the tar info datagram which precedes every entry file in the resulting file, so I see no way around this. With a compressed output it also isn't possible to skip back after writing all output and adjust the file size.
But as you phrased your question, this might be possible in your case.
All you will have to do is provide a file-like object (could be a Popen object's output stream) like reader in my simple example case.
I was looping some files to copy the content of somes file to a new file but after I run the code, the result shows lot of symbols in the new file, not the text content of the files I looped.
first, when I ran the code without putting the 'encoding' attribute in open file line, it showed an error message like,
UnicodeEncodeError: 'charmap' codec can't encode character '\x8b' in position 12: character maps to .
I tried various encodings like utf-8,latin1 but nothing worked and when i put 'errors=ignore' in the open file line, then the result showed like I described above.
import os
import glob
folder = os.path.join('R:', os.sep, 'Files')
def notes():
for doc in glob.glob(folder + r'\*'):
if doc.endswith('.pdf'):
with open(doc,'r') as f:
x = f.readlines()
with open('doc1.text', 'w+') as f1:
for line in x:
f1.write(line)
notes()
If I understand your example correctly and you’re trying to read PDF files, your problem is not one of encoding but of file format. PDF files don’t just to store your text in coding materials are unique format that you need to be able to read in order to extract the text. There are a couple of python libraries that can read PDF files (such as Py2PDF), please refer to this thread for more information: How to extract text from a PDF file?
I have written a Tclsh code that will fetch a zip file content in base64 format through xml-rpc method. I am dumping that base64 data into a file using the following snippet:
#!/usr/bin/tclsh
...
set mybase64Dump [myXmlRpcCallToReturnThisDump]
set zipFilePtr [open "xyz.zip" "w"]
puts $zipFilePtr $mybase64Dump
close $zipFilePte
Zip file was getting generated with XKbytes of size, but when trying to open using 7zip it says, Is not Archive. But I copy pasted the same base64 dump in a online converter. It was giving me a proper extractable zip file.
Is it something I am doing wrongly?
You probably need to configure the output file to be binary, not ascii. The default translation for a newly opened file is "auto", which does system-specific translation of the end-of-line characters, which is not what you want for a .zip file. Configure this using fconfigure on the handle after opening it or by adding the BINARY access flag to the open command.
See http://www.tcl.tk/man/tcl8.5/TclCmd/open.htm and http://www.tcl.tk/man/tcl8.5/TclCmd/fconfigure.htm for details on the syntax.
I have a module which is supposed to extract the email attachment and place it at a specific location. The code creates the POP3 Client to fetch the mail and has been using EMAIL::MIME::Attachment::Stripper module to extract the attachment as below.
my $mail=$pop->HeadAndBody($i);
my $parsed = Email::MIME->new($mail);
my $stripper = Email::MIME::Attachment::Stripper->new($parsed);
my #attachments = $stripper->attachments;
foreach my $a(#attachments)
{
next if $a->{content_type} !~ /octet-stream/i;
my $f = new IO::File "C:/MAIL_PARSING_DATA/" . "<filename>.<file-extension>", "w" or die "Can not create file!";
print $f $a->{payload};
goto EXITPOINT;
}
The code is working fine for the standard files identified by the Perl module, like spreadsheet etc. But not for the specific mail which has a zipped Excel file as an attachment. While extracting the file the identified content_type by the Perl script for this file is application/octet-stream. While extracting the file using the above-mentioned code the file seems to be broken because:
The file is not getting opened through WinZip, WinRAR or 7-Zip.
File size for the file extracted through this script is slightly different than the same file extracted using Outlook.
Kindly provide some input on the issue.
The problem is here
next if $a->{content_type} !~ /octet-stream/i;
zip have content type as application/zip
#NeoNox, Thanks for the response. But the perl system identifies the file as octet-stream only, not with application/zip.
Anyways, I could find the solution.
I could resolve the problem by writing the file in the binary mode(mentioning explicitely as below:
binmode $fh;
Here the file is being writting in the binary mode So I don't get the output as the broken file.
I am finding the same issue.
XLSX files are not seen as attachments.
I am using IMAP.
It will download the email body, any jpgs or pngs etc but does not find the attached xlsx even when in an email that has attachments that are downloaded.