I've got an AppleScript file to replace filenames which are supplied through a CSV file. While I've got the script to work, it has issues with encoding of the strings/filenames.
The finding of files works perfectly. Renaming it to something like Malmö results in a very weird encoded string.
The CSV originates from Microsoft Excel, and I suspect is not properly UTF-8 encoded. And now I'm stuck in how to handle the encoding properly. (or how to convert the encoding). As far as I know it has the default Excel encoding ISO 8859-1.
set theFile to (choose file with prompt "Select the CSV file")
set thePath to (choose folder with prompt "Select directory") as string
set theCSVData to paragraphs of ((read theFile))
set {oldTID, my text item delimiters} to {my text item delimiters, ";"}
repeat with thisLine in theCSVData
set {oldFileName, newFileName} to text items of thisLine
if length of oldFileName > 0 then
set oldFile to thePath & oldFileName
set newFile to newFileName
tell application "System Events"
if exists file oldFile then
set name of file oldFile to newFile
end if
end tell
end if
end repeat
So my question is, how do I fix the encoding issue, either by reading it properly or by encoding the file first (through applescript)
I really think the filenames are encoded using utf16LE, but I can be wrong. Anyhow, there is a utility you can access via Terminal, that is named iconv (man iconv), that you can experiment with, and maybe re-encode the filename with. If you receive the output from a do shell script as text, or unicode text, then you get back utf16, so it won't be corrupted after the conversion. (Before you use it as a filename).
Use:
set theCSVData to paragraphs of ((read theFile as «class utf8»))
Related
I am currently looking for hidden data in a jpg file but I have no clue on how to operate.
There is a jpg file containing text in a format I have never seen before :
-ne \xff\xd8\xff\xe0\x00\x10\x4a\x46\x49\x46\x00\x01\x01\x01\x00\x60\x00\x60\x00\x00\xff\xdb\x00\x43\x00\x06\x04\x04\x05\x04\x04\x06\x05\x05\x05\x06\x06\x06\x07\x09\x0e\x09\x09\x08\x08\x09\x12\x0d\x0d\x0a\x0e\x15\x12\x16\x16\x15\x12\x14\x14\x17\x1a\x21\x1c\x17\x18\x1f\x19\x14\x14\x1d\x27\x1d\x1f\x22\x23\x25\x25\x25\x16\x1c\x29\x2c\x28\x24\x2b\x21\x24\x25\x24\xff\xdb\x00\x43\x01\x06\x06\x06\x09\x08\x09\x11\x09\x09\x11\x24\x18\x14\x18\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\xff\xc0\x00\x11\x08\x01\x8e\x03\x4e\x03\x01\x22\x00\x02\x11\x01\x03\x11\x01\xff\xc4\x00\x1f\x00\x00\x01\x05\x01\x01\x01\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\xff\xc4\x00\xb5\x10\x00\x02\x01\x03\x03\x02\x04\x03\x05\x05\x04\x04\x00\x00\x01\x7d\x01\x02\x03\x00\x04\x11\x05\x12\x21\x31\x41\x06\x13\x51\x61\x07\x22\x71\x14\x32\x81\x91\xa1\x08\x23
-ne \x42\xb1\xc1\x15\x52\xd1\xf0\x24\x33\x62\x72\x82\x09\x0a\x16\x17\x18\x19\x1a\x25\x26\x27\x28\x29\x2a\x34\x35\x36\x37\x38\x39\x3a\x43\x44\x45\x46\x47\x48\x49\x4a\x53\x54\x55\x56\x57\x58\x59\x5a\x63\x64\x65\x66\x67\x68\x69\x6a\x73\x74\x75\x76\x77\x78\x79\x7a\x83\x84\x85\x86\x87\x88\x89\x8a\x92\x93\x94\x95\x96\x97\x98\x99\x9a\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xff\xc4\x00\x1f\x01\x00\x03\x01\x01\x01\x01\x01\x01\x01\x01\x01\x00\x00\x00\x00\x00\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\xff\xc4\x00\xb5\x11\x00\x02\x01\x02\x04\x04\x03\x04\x07\x05\x04\x04\x00\x01\x02\x77\x00\x01\x02\x03\x11\x04\x05\x21\x31\x06\x12\x41\x51\x07\x61\x71\x13\x22\x32\x81\x08\x14\x42\x91\xa1\xb1\xc1\x09\x23\x33\x52\xf0\x15\x62\x72\xd1\x0a\x16\x24\x34\xe1\x25\xf1\x17\x18\x19\x1a\x26\x27\x28\x29\x2a\x35\x36\x37\x38\x39\x3a\x43\x44\x45\x46\x47\x48\x49
This is just the beggining of the file as there is at least a hundred lines.
The file type given by the command file : file.jpg: ASCII text, with very long lines
I tried some of the common tools to identify any patterns or hidden data like exiftools, strings, xxd but I found nothing.
If you have any idea on what to do it would be very much appreciated.
If it's a challenge of CTF, there are some common way to find out flag.
First try to find flag in file metadata, like description of file field
you can also try tool: stegsolve.jar.
In more advance sence, stego info hidden with some math calulation, give this tool a try: zsteg
Perhaps I'm misunderstanding the problem here, but if your file actually starts with a backslash character followed by the characters x, f, f, \, x, d, 8 and so on, then what you're looking at is the binary content of a JPG file that has been converted into ASCII text.
If so, you need to convert this back into binary data. For example, in Linux or MacOS, you could do this by entering the following on the command line:
echo -ne '\xff\xd8\xff\xe0\x00\x10\x4a\x46\x49\x46\x00\x01...etc...' > img.jpg
echo -ne '\x42\xb1\xc1\x15\x52\xd1\xf0\x24\x33\x62\x72\x82...etc...' >> img.jpg
(Note: > sends the results to a new file, and >> appends to the end of the file)
Or alternatively in Python:
with open("img.jpg","wb") as f:
f.write(b'\xff\xd8\xff\xe0\x00\x10\x4a\x46\x49\x46\x00\x01...etc...')
f.write(b'\x42\xb1\xc1\x15\x52\xd1\xf0\x24\x33\x62\x72\x82...etc...')
# and so on for all the other lines
Either way, you should end up with a file called img.jpg containing the image you're after.
I've tried searching the web, and a number of different things I've read on the web, but don't seem to get the desired result.
I'm using Windows 7 and Python 3.6.
I'm connecting to an Oracle db with cx_oracle and creating a text file with the query results. The file that is created (which I'll call my_file.txt to make it easy) has 3688 lines in it all with CRLF which needs to be converted to the unix LF.
If I run python crlf.py my_file.txt it is all converted correctly & there is no issues, but that means I need to run another command manually which I do not want to do.
So I tried adding the code below to my file.
filename = "NameOfFileToBeConverted"
fileContents = open(filename,"r").read()
f = open(filename,"w", newline="\n")
f.write(fileContents)
f.close()
This does convert the majority of the CRLF to LF but # line 3501 it has a NUL character 3500 times on the one line followed by a row of data from the database & it ends with the CRLF, every line from here on still has the CRLF.
So with that not working, I removed it and then tried
import subprocess
subprocess.Popen("crlf.py "+ filename, shell=True)
I also tried using
import os
os.system("crlf.py "+ filename)
The "+ filename" in the two examples above is just providing the filename that is created during the data extract.
I don't know what else to try from here.
Convert Line Endings in-place (with Python 3)
Windows to Linux/Unix
Here is a short script for directly converting Windows line endings (\r\n also called CRLF) to Linux/Unix line endings (\n also called LF) in-place (without creating an extra output file):
# replacement strings
WINDOWS_LINE_ENDING = b'\r\n'
UNIX_LINE_ENDING = b'\n'
# relative or absolute file path, e.g.:
file_path = r"c:\Users\Username\Desktop\file.txt"
with open(file_path, 'rb') as open_file:
content = open_file.read()
content = content.replace(WINDOWS_LINE_ENDING, UNIX_LINE_ENDING)
with open(file_path, 'wb') as open_file:
open_file.write(content)
Linux/Unix to Windows
Just swap the line endings to content.replace(UNIX_LINE_ENDING, WINDOWS_LINE_ENDING).
Code Explanation
Important: Binary Mode We need to make sure that we open the file both times in binary mode (mode='rb' and mode='wb') for the conversion to work.
When opening files in text mode (mode='r' or mode='w' without b), the platform's native line endings (\r\n on Windows and \r on old Mac OS versions) are automatically converted to Python's Unix-style line endings: \n. So the call to content.replace() couldn't find any line endings to replace.
In binary mode, no such conversion is done.
Binary Strings In Python 3, if not declared otherwise, strings are stored as Unicode (UTF-8). But we open our files in binary mode - therefore we need to add b in front of our replacement strings to tell Python to handle those strings as binary, too.
Raw Strings On Windows the path separator is a backslash \ which we would need to escape in a normal Python string with \\. By adding r in front of the string we create a so called raw string which doesn't need any escaping. So you can directly copy/paste the path from Windows Explorer.
Alternative We open the file twice to avoid the need of repositioning the file pointer. We also could have opened the file once with mode='rb+' but then we would have needed to move the pointer back to start after reading its content (open_file.seek(0)) and truncate its original content before writing the new one (open_file.truncate(0)).
Simply opening the file again in write mode does that automatically for us.
Cheers and happy programming,
winklerrr
I am trying to write file in my home folder(I am using Linux operating system) while i am writing the file into temp it's working
put shell("echo $HOME") into last1
The above code for getting home folder and I am place the path into variable last1
put the text of field "bash1" into URL "file:last1/dic.sh"
Here bash1 is an text field and it's contains some shell script i want to write into home directory
The below code is Works
put the text of field "bash1" into URL "file:/tmp/dic.sh"
How i rewrite my code
As your last1 variable is enclosed in quotes, it's getting treated as a literal string rather than a variable. The following would work:
put field "bash1" into URL ("file:" & last1 & "/dic.sh")
Note that you do not have to refer explicitly to the text property when putting texts from fields - you can do the above. Furthermore, if you're on Linux, you can just use the ~ shortcut to refer to the user's home directory:
put field "bash1" into URL "file:~/dic.sh"
You have to "open" the file, write your stuff and "close" the file again.
try something like
put last1 & "/dic.sh" into myFile
open file myFile for write
write the text of field "bash1" to file myFile
close file myFile
Complete novice here so all help gratefully recieved. I am hoping to edit a text file with a batch file.
I have a text file that is all one line.
I need the first 240 characters deleted
I need the last 82 characters deleted
Finally, anything in between I need each 100 characters to be separated with a line break
Thanks
Mark
try this (pure batch, no VBS):
#ECHO OFF &SETLOCAL
SET "longstring=012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789X012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789X0123456789012345678901234567890123456789012345678901234567890123456789012345678901"
ECHO %longstring%
ECHO(
REM I need the first 240 characters deleted
SET "longstring=%longstring:~240%"
ECHO %longstring%
ECHO(
REM I need the last 82 characters deleted
SET "longstring=%longstring:~0,-82%"
ECHO %longstring%
ECHO(
REM I need each 100 characters to be separated with a line break
SET LB=^
SET "right=%longstring%"
SET "longstring="
SETLOCAL ENABLEDELAYEDEXPANSION
:loop
SET "left=%right:~0,100%"
SET "right=%right:~101%"
SET "longstring=!longstring!!left!!LB!"
IF DEFINED right GOTO :loop
ECHO(!longstring!
..output is:
012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789X012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789X0123456789012345678901234567890123456789012345678901234567890123456789012345678901
X012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789X0123456789012345678901234567890123456789012345678901234567890123456789012345678901
X012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789X
X012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789X
Your problem is easily solved with nearly any programming language... except Windows batch :(
It is nearly impossible with Windows batch if your file is longer than 8192 bytes because batch variables are limited to 8192 characters. There is a complicated method that uses FC /B to effectively read one byte at a time, expressed as hex. It would be relatively easy to do the counting. But then each hex code would need to be converted back into the character value. You do not want to go through all that pain, except for maybe academic interest.
You could use PowerShell, VBScript, or JScript to easily do what you want.
But if you want to stick with batch, then you will need a non-standard utility to help you.
There are free utilities you could download that would make the solution trivial. For example, sed for Windows could make short work of this problem.
I have written a hybrid JScript/batch utility called REPL.BAT that performs regular expression search and replace on stdin, and writes the result to stdout. The utility is pure script that works on any Windows machine from XP onward - no exe download required. REPL.BAT is available here. Complete documentation is embedded within the script.
Assuming REPL.BAT is either in your current directory, or better yet, somewhere within your PATH, then the following simple batch script should do the trick. The script takes the name of the file to modify as the first and only argument. The file spec can include path information. Be sure to enclose the file in quotes if it contains spaces or other special characters.
#echo off
type %1|repl "^.{240}(.*).{82}$" "$1"|repl ".{100}" "$&\r\n" x >"%~1.mod"
move /y "%~1.mod" %1 >nul
I'd strongly recommend switching to PowerShell or at least VBScript if at all possible. It'd be a lot easier to do what you want with those languages.
PowerShell:
$filename = 'C:\path\to\your.txt'
(Get-Content $filename | Out-String) `
-replace '^[\s\S]{240}', '' `
-replace '[\s\S]{82}$','' `
-replace '([\s\S]{100})',"`$1`r`n" `
| Set-Content $filename
VBScript:
filename = "C:\path\to\your.txt"
Set fso = CreateObject("Scripting.FileSystemObject")
Set re = New RegExp
re.Global = True
txt = fso.OpenTextFile(filename).ReadAll
re.Pattern = "^[\s\S]{240}"
txt = re.Replace(txt, "")
re.Pattern = "[\s\S]{82}$"
txt = re.Replace(txt, "")
re.Pattern = "([\s\S]{100})"
txt = re.Replace(txt, "$1" & vbNewLine)
fso.OpenTextFile(filename, 2).Write txt
I'm writing a python3 program that generates a text file that is post-procesed with asciidoc for the final report in html and pdf.
The python program generates thousands files with graphics to be included in the final report. The filenames for the files are generated with tempfile.NamedTemporaryFile
The problem it that the character set used by tempfile is defined as:
characters = "abcdefghijklmnopqrstuvwxyz0123456789_"
then I end with some files with names like "_6456_" and asciidoc interprets the "_" as formatting and inserts some html that breaks the report.
I need to either find a way to "escape" the filenames in asciidoc or control the characters in the temporary file.
My current solution is to rename the temporary file after I close it to replace the "_" with some other character (not in the list of characters used by tempfile to avoid a collision) but i have the feeling that there is a better way to do it.
I will appreciate any ideas. I'm not very proficient with python yet, i think overloading _RandomNameSequence in tempfile will work, but i'm not sure how to do it.
regards.
Hack way, based on manipulating tempfile internals:
class MyRandomSequence(tempfile._RandomNameSequence):
characters = "xyz123"
tempfile._name_sequence = MyRandomSequence()
# make your temporary file
Example:
>>> tempfile.NamedTemporaryFile()
<open file '<fdopen>', mode 'w+b' at 0x1013b5540>
>>> k=_
>>> k.name
'/var/folders/Su/SuMQtmxiE941sUwe8d91lE+++TU/-Tmp-/tmp33x22z'
Maybe you could create a temporary directory using tempfile.tempdir and generate the filenames manually such as file1, file2, ..., filen . This way you easily avoid "_" characters and you can just delete the temporary directory after you are finished with that.
Why don't you create a generator yourself?
Example:
import string
from random import choice
def generate():
size = 9
return ''.join([choice(string.letters + string.digits) for i in range(size)])
Source