Converts strings of binary to binary - python-3.x

I have a text file and a would like to read it in binary so I can transform its content into hexadecimal characters.
Then, I need to replace '20' by '0' and '80', 'e2', '8f' by '1'.
This would create a string of 0 and 1 (basically binary).
Finally, I need to convert this binary string into ascii characters.
I'm almost finish but I struggle with the last part:
import binascii
import sys
bin_file = 'TheMessage.txt'
with open(bin_file, 'rb') as file:
file_content = file.read().hex()
file_content = file_content.replace('20', '0').replace('80', '1').replace('e2', '1').replace('8f', '1')
print(file_content)
text_bin = binascii.a2b_uu(file_content)
The last line produces an error (I do not fully understand strings/hex/binary interpretation in python):
Traceback (most recent call last):
File "binary_to_string.py", line 34, in <module>
text_bin = binascii.a2b_uu(file_content)
binascii.Error: Trailing garbage
Could you give me a hand?
I'm working on this file: blank_file

I think you're looking for something like this? Refer to comments for why I do what I did.
import binascii
import sys
bin_file = 'TheMessage.txt'
with open(bin_file, 'rb') as file:
file_content = file.read().hex()
file_content = file_content.replace('20', '0').replace('80', '1').replace('e2', '1').replace('8f', '1')
# First we must split the string into a list so we can get bytes easier.
bin_list = []
for i in range(0, len(file_content), 8): # 8 bits in a byte!
bin_list.append(file_content[i:i+8])
message = ""
for binary_value in bin_list:
binary_integer = int(binary_value, 2) # Convert the binary value to base2
ascii_character = chr(binary_integer) # Convert integer to ascii value
message+=ascii_character
print(message)
One thing I noticed while working with this is that using your solution/file, there are 2620 bits, and this does not divide into 8, so it can not properly become bytes.

Related

I'm getting trouble in using numpy and handling list

This code read CSV file line by line and counts the number on each Unicode but I can't understand two parts of code like below.I've already googled but I could't find the answer. Could you give me advice ?
1) Why should I use numpy here instead of []?
emoji_time = np.zeros(200)
2) What does -1 mean ?
emoji_time[len(emoji_list)-1] = 1 ```
This is the code result:
0x100039, 47,
0x10002D, 121,
0x100029, 30,
0x100078, 6,
unicode_count.py
import codecs
import re
import numpy as np
​
file0 = "./message.tsv"
f0 = codecs.open(file0, "r", "utf-8")
list0 = f0.readlines()
f0.close()
print(len(list0))
​
len_list = len(list0)
emoji_list = []
emoji_time = np.zeros(200)
​
for i in range(len_list):
a = "0x1000[0-9A-F][0-9A-F]"
if "0x1000" in list0[i]: # 0x and 0x1000: same nuumber
b = re.findall(a, list0[i])
# print(b)
for j in range(len(b)):
if b[j] not in emoji_list:
emoji_list.append(b[j])
emoji_time[len(emoji_list)-1] = 1
else:
c = emoji_list.index(b[j])
emoji_time[c] += 1
print(len(emoji_list))
1) If you use a list instead of a numpy array the result should not change in this case. You can try it for yourself running the same code but replacing emoji_time = np.zeros(200) with emoji_time = [0]*200.
2) emoji_time[len(emoji_list)-1] = 1. What this line is doing is the follow: If an emoji appears for the first time, 1 is add to emoji_time, which is the list that contains the amount of times one emoji occurred. len(emoji_list)-1 is used to set the position in emoji_time, and it is based on the length of emoji_list (the minus 1 is only needed because the list indexing in python starts from 0).

Import to Python a specific format file line per line

How can I Import this file which contains plain text with numbers?
It's difficult to import because the first line contains 7 numbers and the second line contains 8 numbers...
In general:
LINE 1: 7 numbers.
LINE 2: 8 numbers.
LINE 3: 7 numbers.
LINE 4: 8 numbers.
... and so on
I just had tried to read but cannot import it. I need to save the data in a NumPy array.
filepath = 'CHALLENGE.001'
with open(filepath) as fp:
line = fp.readline()
cnt = 1
while line:
print("Line {}: {}".format(cnt, line.strip()))
line = fp.readline()
cnt += 1
LINK TO DATA
This file contains information for each frequency has is explained below:
You'll have to skip the blank lines when reading as well.
Just check if the first line is blank. If it isn't, read 3 more lines.
Rinse and repeat.
Here's an example of both a numpy array and a pandas dataframe.
import pandas as pd
import numpy as np
filepath = 'CHALLENGE.001'
data = []
headers = ['frequency in Hz',
'ExHy coherency',
'ExHy scalar apparent resistivity',
'ExHy scalar phase',
'EyHz coherency',
'EyHx scalar apparent resistivity',
'EyHx scalar phase',
're Zxx/√(µo)',
'im Zxx/√(µo)',
're Zxy/√(µo)',
'im Zxy/√(µo)',
're Zyx/√(µo)',
'im Zyx/√(µo)',
're Zyy/√(µo)',
'im Zyy/√(µo)',
]
with open(filepath) as fp:
while True:
line = fp.readline()
if not len(line):
break
fp.readline()
line2 = fp.readline()
fp.readline()
combined = line.strip().split() + line2.strip().split()
data.append(combined)
df = pd.DataFrame(data, columns=headers).astype('float')
array = np.array(data).astype(np.float)
# example of type
print(type(df['frequency in Hz'][0]))

Python 3.6 - Splitting hex data

I am trying to read a binary file and get out a header which is in utf-8 format. However the rest of the file has byte values that go over decimal 127, so I cannot convert that to a string. I have to split the text until ; (or 0x3B) and I cannot get it to work.
with open("test_qifs_single_frame.qifs", "rb") as file:
data = file.read()
print(binascii.hexlify(data))
I cannot read it in as a string either, because it tells me that I cannot decode 0x81 to UTF-8. Which I understand, it falls outside of the ASCII range. What can I do to solve this?
You can read the file byte by byte until you reach the stop character, then decode the data that you have read.
Create some sample data
>>> from random import randint
>>> header = 'Heaðer;'.encode('utf-8')
>>> bs = b''.join(bytes.fromhex('{:0>2x}'.format(randint(0, 255))) for _ in range(56))
>>> with open('test_qifs_single_frame.qifs', 'wb') as f:
... f.write(header + bs)
>>>
Read the header from the file
>>> # Create a bytearray to hold the bytes that we read.
>>> ba = bytearray()
>>> import functools
>>> with open('test_qifs_single_frame.qifs', 'rb') as f:
... breader = functools.partial(f.read, 1)
... for b in iter(breader, b';'):
... ba += b
...
>>> ba
bytearray(b'Hea\xc3\xb0er')
>>> ba.decode('utf-8')
'Heaðer'
If the iter builtin is passed a callable and a value, it will call the callable until it returns the value. In the code we use functools.partial to create a function that reads the file one byte at a time, then pass this to iter.

how to calculate the number of unique words just in a part of a file

I have a file in Persian (a Persian sentence, a "tab", then a Persian word, again a "tab" and then an English word). I have to calculate the number of unique words just in Persian sentences and not the Persian and English words after the tabs. Here's the code:
from hazm import*
file = "F.txt"
def WordsProbs (file):
words = set()
with open (file, encoding = "utf-8") as f1:
normalizer = Normalizer()
for line in f1:
tmp = line.strip().split("\t")
words.update(set(normalizer.normalize(tmp[0].split())))
print(len(words), "unique words")
print (words)
To access just the sentences I have to split each line by "\t". And to access each word of the sentence I have to split tmp[0]. The problem is, when I run the code the error below occurs. It's because of the split after tmp[0]. But if I omit this split after tmp[0], it just counts the letters not unique words. How can I fix it? (Is there another way to write this code to calculate unique words?).
The error:
Traceback (most recent call last):
File "C:\Users\yasini\Desktop\16.py", line 15, in
WordsProbs (file)
File "C:\Users\yasini\Desktop\16.py", line 10, in WordsProbs
words.update(set(normalizer.normalize(tmp[0].split())))
File "C:\Python34\lib\site-packages\hazm\Normalizer.py", line 46, in normalize
text = self.character_refinement(text)
File "C:\Python34\lib\site-packages\hazm\Normalizer.py", line 65, in character_refinement
text = text.translate(self.translations)
AttributeError: 'list' object has no attribute 'translate'
sample file:
https://www.dropbox.com/s/r88hglemg7aot0w/F.txt?dl=0
The problem is that hazm.Normalizer.normalize takes a space separated string as an argument NOT a list. You can see an example here under the "Usage" heading.
Remove the .split() from the argument to your normalize function so that
words.update(set(normalizer.normalize(tmp[0].split())))
becomes
words.update(set(normalizer.normalize(tmp[0])))
and you should be good to go.
I found it myself.
from hazm import*
file = "F.txt"
def WordsProbs (file):
words = []
mergelist = []
with open (file, encoding = "utf-8") as f1:
normalizer = Normalizer()
for line in f1:
line = normalizer.normalize(line)
tmp = line.strip().split("\t")
words = tmp[0].split()
#print(len(words), "unique words")
#print (words)
for i in words:
mergelist.append(i)
uniq = set(mergelist)
uniqueWords = len(uniq)

Simple /etc/shadow Cracker

I'm trying to get this shadow file cracker working but I keep getting a TypeError: integer required.
I'm sure its the way I'm using the bytearray function. I've tried creating a new object with bytearray for the "word" and the "salt" however to no avail. So then I tried passing the bytearray constructor to the pbkdf2 function and still nothing. I will post the code:
#!/usr/bin/python
# -*- coding: utf-8 -*-
import hashlib, binascii
import os,sys
import crypt
import codecs
from datetime import datetime,timedelta
import argparse
today = datetime.today()
# Takes in user and the encrypted passwords and does a simple
# Brute Force Attack useing the '==' operator. SHA* is defined by
# a number b/w $, the char's b/w the next $ marker would be the
# rounds, then the salt, and after that the hashed password.
# object.split("some symbol or char")[#], where # is the
# location/index within the list
def testPass(cryptPass,user):
digest = hashlib.sha512
dicFile = open ('Dictionary.txt','r')
ctype = cryptPass.split("$")[1]
if ctype == '6':
print "[+] Hash type SHA-512 detected ..."
print "[+] Be patien ..."
rounds = cryptPass.split("$")[2].strip('rounds=')
salt = cryptPass.split("$")[3]
print "[DEBUG]: " + rounds
print "[DEBUG]: " + salt
# insalt = "$" + ctype + "$" + salt + "$" << COMMENTED THIS OUT
for word in dicFile.readlines():
word = word.strip('\n')
print "[DEBUG]: " + word
cryptWord = hashlib.pbkdf2_hmac(digest().name,bytearray(word, 'utf-8'),bytearray(salt, 'utf-8'), rounds)
if (cryptWord == cryptPass):
time = time = str(datetime.today() - today)
print "[+] Found password for the user: " + user + " ====> " + word + " Time: "+time+"\n"
return
else:
print "Nothing found, bye!!"
exit
# argparse is used in main to parse arguments pass by the user.
# Path to shadow file is required as a argument.
def main():
parse = argparse.ArgumentParser(description='A simple brute force /etc/shadow .')
parse.add_argument('-f', action='store', dest='path', help='Path to shadow file, example: \'/etc/shadow\'')
argus=parse.parse_args()
if argus.path == None:
parse.print_help()
exit
else:
passFile = open (argus.path,'r', 1) # ADDING A 1 INDICATES A BUFFER OF A
for line in passFile.readlines(): # SINGLE LINE '1<=INDICATES
line = line.replace("\n","").split(":") # EXACT BUFFER SIZE
if not line[1] in [ 'x', '*','!' ]:
user = line[0]
cryptPass = line[1]
testPass(cryptPass,user)
if __name__=="__main__":
main()
OUTPUT:
[+] Hash type SHA-512 detected ...
[+] Be patien ...
[DEBUG]: 65536
[DEBUG]: A9UiC2ng
[DEBUG]: hellocat
Traceback (most recent call last):
File "ShadowFileCracker.py", line 63, in <module>
main()
File "ShadowFileCracker.py", line 60, in main
testPass(cryptPass,user)
File "ShadowFileCracker.py", line 34, in testPass
cryptWord = hashlib.pbkdf2_hmac(digest().name,bytearray(word, 'utf-8'),bytearray(salt, 'utf-8'), rounds)
TypeError: an integer is required
The rounds variable needs to be an integer, not a string. The correct line should be:
rounds = int(cryptPass.split("$")[2].strip('rounds='))
Also, strip() might not be the best method for removing the leading "rounds=". It will work, but it strips a set of characters and not a string. A slightly better method would be:
rounds = int(cryptPass.split("$")[2].split("=")[1])

Resources