Python 3 Opening Binary into a list - python-3.x

I have a binary file consisting only of hex numbers.
I want to open the file and create a list with each element of the list consisting
of 1 hexanumber of the file (e.g. 1 byte => AB for example would be 1 element).
I tried it with the "with open" and "readlines" commands and then split the lines into the element size i wanted but failed.
Also it somehow didn´t include a specific Hex number (in my case 0A).
my code is
with open(r"C:\Users\James\Desktop\Test1.bin","rb") as file:
fileread = file.read
linesread = file.readlines()
splitted = linesread.split('\\')
print(splitted)
How do i go about?
Thanks for help

Related

How to read many files have a specific format in python

I am a little bit confused in how to read all lines in many files where the file names have format from "datalog.txt.98" to "datalog.txt.120".
This is my code:
import json
file = "datalog.txt."
i = 97
for line in file:
i+=1
f = open (line + str (i),'r')
for row in f:
print (row)
Here, you will find an example of one line in one of those files:
I need really to your help
I suggest using a loop for opening multiple files with different formats.
To better understand this project I would recommend researching the following topics
for loops,
String manipulation,
Opening a file and reading its content,
List manipulation,
String parsing.
This is one of my favourite beginner guides.
To set the parameters of the integers at the end of the file name I would look into python for loops.
I think this is what you are trying to do
# create a list to store all your file content
files_content = []
# the prefix is of type string
filename_prefix = "datalog.txt."
# loop from 0 to 13
for i in range(0,14):
# make the filename variable with the prefix and
# the integer i which you need to convert to a string type
filename = filename_prefix + str(i)
# open the file read all the lines to a variable
with open(filename) as f:
content = f.readlines()
# append the file content to the files_content list
files_content.append(content)
To get rid of white space from file parsing add the missing line
content = [x.strip() for x in content]
files_content.append(content)
Here's an example of printing out files_content
for file in files_content:
print(file)

Splitting up a large file into smaller files at specific points

I know this question has been asked several times. But those solutions really don't help me here. I have a really big file (5GB almost) to read, get the data and give it to my neural network. I have to read line by line. At first I loaded the entire file into the memory using .readlines() function but it obviously resulted in out-of-memory issue. Next I instead of loading the entire file into the memory, I read it line by line but it still hasn't worked. So now I am thinking to split my file into smaller files and then read each of those files. The file format that for each sequence I have a header starting with '>' followed by a sequence for example:
>seq1
acgtccgttagggtjhtttttttttt
tttsggggggtattttttttt
>seq2
accggattttttstttttttttaasftttttttt
stttttttttttttttttttttttsttattattat
tttttttttttttttt
>seq3
aa
.
.
.
>seqN
bbbbaatatattatatatatattatatat
tatatattatatatattatatatattatat
tatattatatattatatatattatatatatta
tatatatatattatatatatatatattatatat
tatatatattatatattatattatatatattata
tatatattatatattatatatattatatatatta
So now I want to split my file which has 12700000 sequences into smaller files such that for each file with header '>' has it's correct corresponding sequence as well. How can I achieve this in python without running into memory issues. Insights would be appreciated.
I was able to do this with 12,700,000 randomized lines with 1-20 random characters in each line. Though the size of my file was far less than 5GB (roughly 300MB)--likely due to format. All of that said, you can try this:
x = 0
y = 1
string = ""
cycle = "Seq1"
with open(f"{FILEPATH}/main.txt", "r") as file:
for line in file:
if line[0] == ">":
if x % 5000 == 0 and x != 0:
with open(f"{FILEPATH}/Sequence Files/Starting{cycle}.txt", "a") as newfile:
newfile.writelines(string)
cycle = f"Seq{y*5000+1}"
y += 1
string = ""
string += line
x += 1
if line[0] != ">":
string += line
with open(f"{FILEPATH}/Sequence Files/Starting{cycle}.txt", "a") as newfile:
newfile.writelines(string)
This will read the file line-by-line, append the first 5000 values to a string, write the string to a new file, and repeat for the rest of the original file. It will also name the file with the first sequence within the file.
The line that reads if x % 5000 == 0: is the line that defines the number of sequences within each file and the line cycle = "Seq" + str(y*5000+1) creates the formatting for the next filename. You can adjust the 5000 in these if you change your mind about how many sequences per file (you're creating 2,540 new files this way).

How to find a substring in a line from a text file and add that line or the characters after the searched string into a list using Python?

I have a MIB dataset which is around 10k lines. I want to find a certain string (for eg: "SNMPv2-MIB::sysORID") in the text file and add the whole line into a list. I am using Jupyter Notebooks for running the code.
I used the below code to search the search string and it print the searched string along with the next two strings.
basic = open('mibdata.txt')
file = basic.read()
city_name = re.search(r"SNMPv2-MIB::sysORID(?:[^a-zA-Z'-]+[a-zA-Z'-]+) {1,2}", file)
city_name = city_name.group()
print(city_name)
Sample lines in file:
SNMPv2-MIB::sysORID.10 = OID: NOTIFICATION-LOG-MIB::notificationLogMIB
SNMPv2-MIB::sysORDescr.1 = STRING: The MIB for Message Processing and Dispatching.
The output expected is
SNMPv2-MIB::sysORID.10 = OID: NOTIFICATION-LOG-MIB::notificationLogMIB
but i get only
SNMPv2-MIB::sysORID.10 = OID: NOTIFICATION-LOG-MIB
The problem with changing the number of string after the searched strings is that the number of strings in each line is different and i cannot specify a constant. Instead i want to use '\n' as a delimiter but I could not find one such post.
P.S. Any other solution is also welcome
EDIT
You can read all lines one by one of the file and look for a certain Regex that matches the case.
r(NMPv2-MIB::sysORID).* finds the encounter of the string in the parenthesis and then matches everything followed after.
import re
basic = open('file.txt')
entries = map(lambda x : re.search(r"(SNMPv2-MIB::sys).*",x).group() if re.search(r"(SNMPv2-MIB::sys).*",x) is not None else "", basic.readlines())
non_empty_entries = list(filter(lambda x : x is not "", entries))
print(non_empty_entries)
If you are not comfortable with Lambdas, what the above script does is
taking the text from the file, splits it into lines and checks all lines individually for a regex match.
Entries is a list of all lines where the match was encountered.
EDIT vol2
Now when the regex doesn't match it will add an empty string and after we filter them out.

Replacing a float number in txt file

Firstly, I would like to say that I am newbie in Python.
I will ll try to explain my problem as best as I can.
The main aim of the code is to be able to read, modify and copy a txt file.
In order to do that I would like to split the problem up in three different steps.
1 - Copy the first N lines into a new txt file (CopyFile), exactly as they are in the original file (OrigFile)
2 - Access to a specific line where I want to change a float number for other. I want to append this line to CopyFile.
3 - Copy the rest of the OrigFile from line in point 2 to the end of the file.
At the moment I have been able to do step 1 with next code:
with open("OrigFile.txt") as myfile:
head = [next(myfile) for x iin range(10)] #read first 10 lines of txt file
copy = open("CopyFile.txt", "w") #create a txt file named CopyFile.txt
copy.write("".join(head)) #convert list into str
copy.close #close txt file
For the second step, my idea is to access directly to the txt line I am interested in and recognize the float number I would like to change. Code:
line11 = linecache.getline("OrigFile.txt", 11) #opening and accessing directly to line 11
FltNmb = re.findall("\d+\.\d+", line11) #regular expressions to identify float numbers
My problem comes when I need to change FltNmb for a new one, taking into consideration that I need to specify it inside the line11. How could I achieve that?
Open both files and write each line sequentially while incrementing line counter.
Condition for line 11 to replace the float number. Rest of the lines are written without modifications:
with open("CopyFile.txt", "w") as newfile:
with open("OrigFile.txt") as myfile:
linecounter = 1
for line in myfile:
if linecounter == 11:
newline = re.sub("^(\d+\.\d+)", "<new number>", line)
linecounter += 1
outfile.write(newline)
else:
newfile.write(line)
linecounter += 1

Writing to files in ASCII with Python3, not UTF8

I have a program that I created with two sections.
The first one copies a text file with an integer in the middle of the file name in this format.
file = "Filename" + "str(int)" + ".txt"
the user can create as many copies of the file that they would like.
The second part of the program is what I am having the problem with. There is an integer at the very bottom of the file that is to correspond with the integer in the file name. After the first part is done, I open each file one at a time in "r+" read/write format. So I can file.seek(1000) to about where the integer is in the file.
Now in my opinion the next part should be easy. I should just simply have to write str(int) into the file right here. But it wasn't that easy. It worked just fine doing it like that in Linux at home, but at work on Windows it proved difficult. What I ended up having to do after file.seek(1000) is write to the file using Unicode UTF-8. I accomplished this with this code snippet of the rest of the program. I will document it so that it is able to be understood what is going on. Instead of having to write this in Unicode, I would love to be able to write this in good old regular English ASCII characters. Eventually this program will be expanded to include a lot more data at the bottom of each file. Having to write the data in Unicode is going to make things extremely difficult. If I just write the data without turning it into Unicode this is the result. This string is supposed to say #2 =1534, instead it says #2 =ㄠ㌵433.
If someone can show me what I am doing wrong that would be great. I would love to just use something like file.write('1534') to write the data to the file instead of having to do it in Unicode UTF-8.
while a1 < d1 :
file = "file" + str(a1) + ".par"
f = open(file, "r+")
f.seek(1011)
data = f.read() #reads the data from that point in the file into a variable.
numList= list(str(a1)) # "a1" is the integer in the file name. I had to turn the integer into a list to accomplish the next task.
replaceData = '\x00' + numList[0] + '\x00' + numList[1] + '\x00' + numList[2] + '\x00' + numList[3] + '\x00' #This line turns the integer into Utf 8 Unicode. I am by no means a Unicode expert.
currentData = data #probably didn't need to be done now that I'm looking at this.
data = data.replace(currentData, replaceData) #replaces the Utf 8 string in the "data" variable with the new Utf 8 string in "replaceData."
f.seek(1011) # Return to where I need to be in the file to write the data.
f.write(data) # Write the new Unicode data to the file
f.close() #close the file
f.close() #make sure the file is closed (sometimes it seems that this fails in Windows.)
a1 += 1 #advances the integer, and then return to the top of the loop
This is an example of writing to a file in ASCII. You need to open the file in byte mode, and using the .encode method for strings is a convenient way to get the end result you want.
s = '12345'
ascii = s.encode('ascii')
with open('somefile', 'wb') as f:
f.write(ascii)
You can obviously also open in rb+ (read and write byte mode) in your case if the file already exists.
with open('somefile', 'rb+') as f:
existing = f.read()
f.write(b'ascii without encoding!')
You can also just pass string literals with the b prefix, and they will be encoded with ascii as shown in the second example.

Resources