How to get information from text and safe it in variable with python - python-3.x

So I am trying to make an offline dictionary and as a source for the words, I am using a .txt file. I have some questions related to that. How can I find a specific word in my text file and save it in a variable? Also does the length of my file matter and will it affect the speed? That's just a part of my .txt file:
Abendhimmel m вечерно небе.|-|
Abendkasse f Theat вечерна каса.|-|
Abendkleid n вечерна рокля.|-|
Abendland n o.Pl. geh Западът.|-|
The thing that I want is to save the wort, for example, Abendkasse and everything else till this symbol |-| in one variable. Thanks for your help!

I recommend you to look at python's standard library functions (on open files) called realines() and read(). I don't know how large your file is, but you can usually just read the entire thing into ram (with read or readlines) and then search through the string you then get. Searchin can be done with regex or just with a simple loop.
The length of your file will sort of matter, in that opening larger files will take slightly longer. Though usually this is still pretty fast, even for large textfiles. In fact, I think in many cases it will be faster to first read the entire file, because once it is read into ram, all operations on it will be way faster.
an example:
with open("yourlargetextfile.txt", f):
contents = f.readlines()
for line in contents:
# split every line into parts from |-| to the next |-|
parts = line.split("|-|")

Related

python - handle strings in a file with some Japanese in it

I have a .c file that I want to open with python 3 to update a specific number on a specific line.
It seems like the most common way to do this would be to read the file in, write each line to a temporary file, when I get to the line I want, modify it, then write it to the temp file and keep going. Once I'm done, write the contents of the temp file back to the original file.
The problem that I have, is that in the comments of the file there are Japanese characters. I know I can still read it in by adding the error equal ignore argument, that allows me to still read the lines in but it gets rid of the Japanese characters completely and I need to preserve those.
I haven't been able to find a way how to do this. Is there any way to read in a file that's part in Japanese and part in English?

How can Python be forced to use raw string equivalent of variable-stored paths on Windows?

It might seem that this question has been asked hundreds of times, but reading every variant of it, it's clear it has never been fully answered, at least not in the context I am experiencing.
I have a filename variable that is being obtained through a dialog (in Blender), and I need to both use the file name and iterate over its directory. The problem is that Python cannot properly convert the backslashes to forward slashes.
Here is the filename: 'D:\scans\testing\2021_12_01_14_41_38\frame_00000.json'
Storing this in a variable yields 'D:\scans\testing\x821_12_01_14_41_38\x0crame_00000.json'.
In other words, once the dialog passes the filename to the variable, nothing more can be done with it. The file itself may be opened, but attempting any other operation on it automatically converts the escape characters.
Here are some other approaches I have tried:
Attempting a find replace using filename.replace('\\','/') yields 'D:/scans\testing\x821_12_01_14_41_38\x0crame_00000.json'.
Using pathlib.Path(filename) yields a WindowsPath object:
WindowsPath('D:/scans\testing\x821_12_01_14_41_38\x0crame_00000.json')
All I need is the directory and the file separated, but even os.path.basename yields
'testing\x821_12_01_14_41_38\x0crame_00000.json'.
Even trying repr(filename) is to no avail. It yields "'D:\\scans\\testing\x821_12_01_14_41_38\x0crame_00000.json'"
re.sub('\\\\','/',filename) yields 'D:/scans\testing\x821_12_01_14_41_38\x0crame_00000.json'
It's mind boggling that such a simple operation on Windows is so complicated, as I have done it millions of times on Linux (yes, I know). Unfortunately, I cannot use the raw string method (r'string') because this is a variable, not a string. I have seen crazy ideas out there such as r'{}'.format(variable), but that doesn't work for obvious reasons.
I could list hundreds of other failed attempts, including abspath, relpath, and find / replace, and they all lead nowhere. Surely, there is a way to take a full-path filename from a dialog in Windows (in this case, Blender) and split the directory and filename apart?
If you have any ideas how I might work around this problem, please share.
You can try removing the inverted commas form the string while using the variable which has the string stored in it.
I was trying to find file size where file path was chosen by user:
import os
take input on file path
file_path = input("Enter file path without inverted commas:")
prints the size of the file in bytes
print(os.path.getsize(file_path))
Note:
When I copied the path it was copied like this:
"D:\Dev\repo\t1_old\task.py"
So I had to remove the inverted commas, only then the os.path.getsize(file_path) worked.
If I did not remove inverted commas while entering the file path, it gives an error

Partially expand VCF bgz file in Linux

I have downloaded gnomAD files from - https://gnomad.broadinstitute.org/downloads
This is the bgz file
https://storage.googleapis.com/gnomad-public/release/2.1.1/vcf/genomes/gnomad.genomes.r2.1.1.sites.2.vcf.bgz
When I expand using:
zcat gnomad.genomes.r2.1.1.sites.2.vcf.bgz > gnomad.genomes.r2.1.1.sites.2.vcf
The output VCF file becomes more than 330GB. I do not have that kind of space available on my laptop.
Is there a way where I can just expand - say 1 GB of the bgz file OR just 100000 rows from the bgz file?
From what I've been able to determine, a bgz file is compatible with gzip, and a VCF file is a plain text file. Since it's a gzip file, and not a .tar.gz, the solution doesn't require listing any archive contents, and simplifies things a bit.
This can probably be accomplished in several ways, and I doubt this is the best way, but I've been able to successfully decompress the first 100,000 rows into a file using the following code in python3 (it should also work under earlier versions back to 2.7):
#!/usr/bin/env python3
import gzip
ifile = gzip.GzipFile("gnomad.genomes.r2.1.1.sites.2.vcf.bgz")
ofile = open("truncated.vcf", "wb")
LINES_TO_EXTRACT = 100000
for line in range(LINES_TO_EXTRACT):
ofile.write(ifile.readline())
ifile.close()
ofile.close()
I tried this on your example file, and the truncated file is about 1.4 GiB. It took about 1 minute, 40 seconds on a raspberry pi-like computer, so while it's slow, it's not unbearably so.
While this solution is somewhat slow, it's good for your application for the following reasons:
It minimizes disk and memory usage, which could otherwise be problematic with a large file like this.
It cuts the file to exactly the given number of lines, which avoids truncating your output file mid-line.
The three input parameters can be easily parsed from the command line in case you want to make a small CLI utility for parsing other files in this manner.

Writing updated data to an existing file in Python

The data file I am working with takes the format:
6345 Alfonso Chavez 98745.35
2315 Terry Kulakowski 234.0
4455 Yu Chen 78000.0
What I am trying to do is replace the balance(the last item in the line) with an updated balance which I have generated in my code. I'm not sure how to do this with an existing file without wiping the entire thing first, which is obviously not what I want. I was thinking a for loop to iterate over the line and split it into separate list elements, but that will update every users balance instead of the specific persons. Any help is appreciated.
If this is a text file, there is no great way of doing this. In general it's probably impossible/super hard to save changes in a text file without saving/rewriting the whole text file. Instead, what you should be focusing on is the fact that you need O(n) time to loop through the entire file looking for the specific person.
Having said all that, python module fileinput seems like a good way to do this. See this. You can set inplace=True to make it seem like you are just changing that single line in place.
But this is still O(n). It's just secretly rewriting the whole file for you behind your back.
Also some other solutions discussed here previously.

Read a text file to a string using fortran77

Is it possible to read a text file to a string using fortran77.
I actually have a text file in the following format
Some comments
Some comments
n1 m1 comment_with_unknown_number_of_words
..m1 lines of data..
n2 m2 comment_with_unknown_number_of_words
..m2 lines of data..
and so on
whereas n1,n2.. are the orders of the objects. m1, m2,..are the number of lines which contains the data about these objects, respectively. I also want to store the comment of each object for further investigations.
How can I deal with this? Thank you so much in advance!
I can't believe nobody called me on this.. My apologies this in fact only grabs the first word of the comment...
------------original answer----
Not to recomend F77, but this isnt that tough a problem either. Just declare a char variable long enough to hold your longest comment and use a list directed read.
integer m1,n1
char*80 comment
...
read(unit,*)m1,n1,comment
If you want to write it back out without padding a bunch of extra spaces thats a bit of effort but hardly the end of the world.
What you can not do at all in f77 is discern whether your file has trailing blanks at the end of a line, unless you go to direct access reading.
------------improved answer
What you need to do is read the whole line as a string then read your integers from the string:
read(unit,'(a)')comment
read(comment,*)m1,n1
at this point comment contains the whole line including your two integers (perhaps that will do the job for you). If you want to pull off the actual string it requires a bit of coding (I have a ~40 line subroutine to split the string into words). I could post if interesed but I'm more inclined as others to encourage you to see if your code will work with a more modern compiler.

Resources