change character set for tempfile.NamedTemporaryFile

change character set for tempfile.NamedTemporaryFile - python-3.x

I'm writing a python3 program that generates a text file that is post-procesed with asciidoc for the final report in html and pdf.
The python program generates thousands files with graphics to be included in the final report. The filenames for the files are generated with tempfile.NamedTemporaryFile
The problem it that the character set used by tempfile is defined as:
characters = "abcdefghijklmnopqrstuvwxyz0123456789_"
then I end with some files with names like "_6456_" and asciidoc interprets the "_" as formatting and inserts some html that breaks the report.
I need to either find a way to "escape" the filenames in asciidoc or control the characters in the temporary file.
My current solution is to rename the temporary file after I close it to replace the "_" with some other character (not in the list of characters used by tempfile to avoid a collision) but i have the feeling that there is a better way to do it.
I will appreciate any ideas. I'm not very proficient with python yet, i think overloading _RandomNameSequence in tempfile will work, but i'm not sure how to do it.
regards.

Hack way, based on manipulating tempfile internals:
class MyRandomSequence(tempfile._RandomNameSequence):
characters = "xyz123"
tempfile._name_sequence = MyRandomSequence()
# make your temporary file
Example:
>>> tempfile.NamedTemporaryFile()
<open file '<fdopen>', mode 'w+b' at 0x1013b5540>
>>> k=_
>>> k.name
'/var/folders/Su/SuMQtmxiE941sUwe8d91lE+++TU/-Tmp-/tmp33x22z'

Maybe you could create a temporary directory using tempfile.tempdir and generate the filenames manually such as file1, file2, ..., filen . This way you easily avoid "_" characters and you can just delete the temporary directory after you are finished with that.

Why don't you create a generator yourself?
Example:
import string
from random import choice
def generate():
size = 9
return ''.join([choice(string.letters + string.digits) for i in range(size)])
Source

Related

Data hidden in jpg

I am currently looking for hidden data in a jpg file but I have no clue on how to operate.
There is a jpg file containing text in a format I have never seen before :
-ne \xff\xd8\xff\xe0\x00\x10\x4a\x46\x49\x46\x00\x01\x01\x01\x00\x60\x00\x60\x00\x00\xff\xdb\x00\x43\x00\x06\x04\x04\x05\x04\x04\x06\x05\x05\x05\x06\x06\x06\x07\x09\x0e\x09\x09\x08\x08\x09\x12\x0d\x0d\x0a\x0e\x15\x12\x16\x16\x15\x12\x14\x14\x17\x1a\x21\x1c\x17\x18\x1f\x19\x14\x14\x1d\x27\x1d\x1f\x22\x23\x25\x25\x25\x16\x1c\x29\x2c\x28\x24\x2b\x21\x24\x25\x24\xff\xdb\x00\x43\x01\x06\x06\x06\x09\x08\x09\x11\x09\x09\x11\x24\x18\x14\x18\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\xff\xc0\x00\x11\x08\x01\x8e\x03\x4e\x03\x01\x22\x00\x02\x11\x01\x03\x11\x01\xff\xc4\x00\x1f\x00\x00\x01\x05\x01\x01\x01\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\xff\xc4\x00\xb5\x10\x00\x02\x01\x03\x03\x02\x04\x03\x05\x05\x04\x04\x00\x00\x01\x7d\x01\x02\x03\x00\x04\x11\x05\x12\x21\x31\x41\x06\x13\x51\x61\x07\x22\x71\x14\x32\x81\x91\xa1\x08\x23
-ne \x42\xb1\xc1\x15\x52\xd1\xf0\x24\x33\x62\x72\x82\x09\x0a\x16\x17\x18\x19\x1a\x25\x26\x27\x28\x29\x2a\x34\x35\x36\x37\x38\x39\x3a\x43\x44\x45\x46\x47\x48\x49\x4a\x53\x54\x55\x56\x57\x58\x59\x5a\x63\x64\x65\x66\x67\x68\x69\x6a\x73\x74\x75\x76\x77\x78\x79\x7a\x83\x84\x85\x86\x87\x88\x89\x8a\x92\x93\x94\x95\x96\x97\x98\x99\x9a\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xff\xc4\x00\x1f\x01\x00\x03\x01\x01\x01\x01\x01\x01\x01\x01\x01\x00\x00\x00\x00\x00\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\xff\xc4\x00\xb5\x11\x00\x02\x01\x02\x04\x04\x03\x04\x07\x05\x04\x04\x00\x01\x02\x77\x00\x01\x02\x03\x11\x04\x05\x21\x31\x06\x12\x41\x51\x07\x61\x71\x13\x22\x32\x81\x08\x14\x42\x91\xa1\xb1\xc1\x09\x23\x33\x52\xf0\x15\x62\x72\xd1\x0a\x16\x24\x34\xe1\x25\xf1\x17\x18\x19\x1a\x26\x27\x28\x29\x2a\x35\x36\x37\x38\x39\x3a\x43\x44\x45\x46\x47\x48\x49
This is just the beggining of the file as there is at least a hundred lines.
The file type given by the command file : file.jpg: ASCII text, with very long lines
I tried some of the common tools to identify any patterns or hidden data like exiftools, strings, xxd but I found nothing.
If you have any idea on what to do it would be very much appreciated.

If it's a challenge of CTF, there are some common way to find out flag.
First try to find flag in file metadata, like description of file field
you can also try tool: stegsolve.jar.
In more advance sence, stego info hidden with some math calulation, give this tool a try: zsteg

Perhaps I'm misunderstanding the problem here, but if your file actually starts with a backslash character followed by the characters x, f, f, \, x, d, 8 and so on, then what you're looking at is the binary content of a JPG file that has been converted into ASCII text.
If so, you need to convert this back into binary data. For example, in Linux or MacOS, you could do this by entering the following on the command line:
echo -ne '\xff\xd8\xff\xe0\x00\x10\x4a\x46\x49\x46\x00\x01...etc...' > img.jpg
echo -ne '\x42\xb1\xc1\x15\x52\xd1\xf0\x24\x33\x62\x72\x82...etc...' >> img.jpg
(Note: > sends the results to a new file, and >> appends to the end of the file)
Or alternatively in Python:
with open("img.jpg","wb") as f:
f.write(b'\xff\xd8\xff\xe0\x00\x10\x4a\x46\x49\x46\x00\x01...etc...')
f.write(b'\x42\xb1\xc1\x15\x52\xd1\xf0\x24\x33\x62\x72\x82...etc...')
# and so on for all the other lines
Either way, you should end up with a file called img.jpg containing the image you're after.

How to replace multiple tabs with only one tab using python3 + Pandas in a given .csv File

i'm trying to replace multiple tabs with only one tab using python3 + Pandas in a given .csv File, but i'm not able to find a way to solve this problem; if my function is:
def function(csv_file):
-remove multiple tabs --> means have a \t \t b ==> a \t \b
[...]
the file must be remain a csv file.
How could i do it?

A csv is just a text file that can be parsed with tailored tools, but can also be read as plain text. So, you can use regex to substitute consecutive \t instances.
You still need to provide more details, but take this as a provisional answer.
import re
with open('test.csv', 'r') as fo:
text = fo.read()
print(text)
print(repr(text))
text = re.sub(r'\t+', r'\t', text)
print(text)
print(repr(text))
Output
test sdasdf
asfasdf asdf asfasdf asdf
'test\t\tsdasdf\nasfasdf\tasdf\tasfasdf\t\tasdf'
# after regex
test sdasdf
asfasdf asdf asfasdf asdf
'test\tsdasdf\nasfasdf\tasdf\tasfasdf\tasdf'
Notice the last print does not have any consecutive tabs.
Now you can write back to csv.
import os
with open('test_temp.csv', 'w') as fo:
fo.write(text)
# os.remove('test.csv')
# os.rename('test_temp.csv', 'test.csv')
It is a good idea to write a temp file, remove the original, and finally rename the temp. This is so you have a safe copy at all times for odd situations like corrupt file writes, power outages, or any other contingency.

rename command for replacing text in filename from a certain point (character), but only up to, and maintaining the file extension

I've got a ton of files as follows
audiofile_drums_1-ktpcwybsh5c.wav
soundsample_drums_2-fghlkjy57sa.wav
noise_snippet_guitar_5-mxjtgqta3o1.wav
louder_flute_9-mdlsiqpfj6c.wav
I want to remove everything between and including the "-" and the .wav file extension, to be left with
audiofile_drums_1.wav
soundsample_drums_2.wav
noise_snippet_guitar_5.wav
louder_flute_9.wav
I've tried to do delete everything following and including the character "-" using
rename 's/-.*//' *
Which gives me
audiofile_drums_1
soundsample_drums_2
noise_snippet_guitar_5
louder_flute_9
And for lack of finding an easy way to rename all the files again, adding .wav the extension, I am hoping there is a slicker way to do this in one nifty command in one stage instead of 2.
Any suggestions?
Thanks

You can use rename 's/-[^\.]*\.wav$/\.wav/' *
The first part -[^\.]*\.wav$ searchs for a - followed by n chars that are not . followed by .wav and the end of filename. The end of filename and .wav is not strictly needed but it helps avoid renaming files you don't want to rename.
The /\.wav/ preserves the extension.
Please not that rename is not a standard utility, and is part of perl, so rename may not be available on every linux system.

This works in my specific case, but should work for any file extension.
rename -n 's/-.*(?=\.wav$)//' *
The command looks for all characters after and inclusive of the - symbol in the filename, then, using a positive lookahead** (?=\.wav$) to search for the characters (the file extension in this case) at the end of the filename (denoted by $, and replaces them with no characters (removing them).
** NOTE: A positive look ahead is a zero width assertion.
It will affect the match but it will not be included
in the replacement. (The '.wav' part will not be
erased)
In this example (?=\.wav$) is the positive lookahead. The dollar sign $, as in regex, denotes at the end of the line, so perfect for a file extension.

Why do apostrophes (" ' ") turn into ▒'s when reading from a file in Python?

I used Bash to open a Python file. The Python file should read a utf-8 file and display it in the terminal. It gives me a bunch of ▒'s ("Aaron▒s" instead of "Aaron's"). Here's the code:
# It reads text from a text file (done).
f = open("draft.txt", "r", encoding="utf8")
# It handles apostrophes and non-ASCII characters.
print(f.read())
I've tried different combinations of:
read formats with the open function ("r" and "rb")
strip() and rstrip() method calls
decode() method calls
text file encoding (specifically ANSI, Unicode, Unicode big endian, and UTF-8).
It still doesn't display apostrophes (" ' ") properly. How do I make it display apostrophes instead of ▒'s?

The issue is with Git Bash. If I switch to Powershell, Python displays the apostrophes (Aaron's) perfectly. The semantic read errors (Aaron▒s) appear only with Git Bash. I'll give more details if I learn more about it.
Update: #jasonharper and #entpnerd suggested that the draft.txt apostrophe might be "apostrophe-ish" and not a legitimate apostrophe. I compared the draft.txt apostrophe (copy and pasted from a Google Doc) with an apostrophe directly entered. They look different (’ vs. '). In xxd, the value for the apostrophe-ish character is 92. An actual apostrophe is 27. Git Bash only supports the latter (unless there's just something I need to configure, which is more likely).
Second Update: Clarified that I'm using Git Bash. I wasn't aware that there were multiple terminals (is that the right way of putting it?) that ran Bash.

Python - How do I separate data into multiple lines

I have two strings that i want to put into a txt file but when I try and write then, it's only on the first line, I want the string to be on separate lines how do I do so?
Here is the writing part of my code btw:
saveFile = open('points.txt', 'w')
saveFile.write(str(jakesPoints))
saveFile.write(str(alexsPoints))
saveFile.close
if jakesPoints was 10 and alexsPoints was 12 then the text file would be
1012
but i want to to be
10
12

You can use a newline character (\n) to move to a new line. For your example:
with open('points.txt', 'w') as saveFile:
saveFile.write("{}\n".format(jakesPoints))
saveFile.write("{}\n".format(alexsPoints))
The other things to note:
It is helpful to open files using with - this will take care of opening and closing the file automatically (which is typically preferred over trying to remember to .close()).
The {}.format() section is used to convert your numbers to a string and add the newline character. I found https://pyformat.info/ explained the string formatters pretty good and highlight all the main advantages.

with open('points.txt', 'w') as saveFile:
saveFile.write(str(jakesPoints))
saveFile.write("\n")
saveFile.write(str(alexsPoints))
See difference betweenw and a used in open(). Also see join() .

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

change character set for tempfile.NamedTemporaryFile - python-3.x

Maybe you could create a temporary directory using tempfile.tempdir and generate the filenames manually such as file1, file2, ..., filen . This way you easily avoid "_" characters and you can just delete the temporary directory after you are finished with that.

Why don't you create a generator yourself? Example: import string from random import choice def generate(): size = 9 return ''.join([choice(string.letters + string.digits) for i in range(size)]) Source

Related

Data hidden in jpg

How to replace multiple tabs with only one tab using python3 + Pandas in a given .csv File

rename command for replacing text in filename from a certain point (character), but only up to, and maintaining the file extension

Why do apostrophes (" ' ") turn into ▒'s when reading from a file in Python?

Python - How do I separate data into multiple lines

Categories

Resources