python - handle strings in a file with some Japanese in it - python-3.x

I have a .c file that I want to open with python 3 to update a specific number on a specific line.
It seems like the most common way to do this would be to read the file in, write each line to a temporary file, when I get to the line I want, modify it, then write it to the temp file and keep going. Once I'm done, write the contents of the temp file back to the original file.
The problem that I have, is that in the comments of the file there are Japanese characters. I know I can still read it in by adding the error equal ignore argument, that allows me to still read the lines in but it gets rid of the Japanese characters completely and I need to preserve those.
I haven't been able to find a way how to do this. Is there any way to read in a file that's part in Japanese and part in English?

Related

How to get information from text and safe it in variable with python

So I am trying to make an offline dictionary and as a source for the words, I am using a .txt file. I have some questions related to that. How can I find a specific word in my text file and save it in a variable? Also does the length of my file matter and will it affect the speed? That's just a part of my .txt file:
Abendhimmel m вечерно небе.|-|
Abendkasse f Theat вечерна каса.|-|
Abendkleid n вечерна рокля.|-|
Abendland n o.Pl. geh Западът.|-|
The thing that I want is to save the wort, for example, Abendkasse and everything else till this symbol |-| in one variable. Thanks for your help!
I recommend you to look at python's standard library functions (on open files) called realines() and read(). I don't know how large your file is, but you can usually just read the entire thing into ram (with read or readlines) and then search through the string you then get. Searchin can be done with regex or just with a simple loop.
The length of your file will sort of matter, in that opening larger files will take slightly longer. Though usually this is still pretty fast, even for large textfiles. In fact, I think in many cases it will be faster to first read the entire file, because once it is read into ram, all operations on it will be way faster.
an example:
with open("yourlargetextfile.txt", f):
contents = f.readlines()
for line in contents:
# split every line into parts from |-| to the next |-|
parts = line.split("|-|")

Spool do not display information about alter while error

I have few lines in a file that's been loaded into list. In the file is line that starts with EKO-1223... I would like to get this line, so I am using a while loop and iterate over the lines from the list. I am using nPos = StrFind(svLine, "EKO") but the nPos is < 0 so it does not find the result, why?
If you want a good answer, you'll need to provide more details on how you get the text from the file into svLine, as well as anything you know about the file's encoding, etc. (If you know nothing about encodings, a hex dump of the first few bytes of the file, as well as those including the EKO- may suffice to identify it.)
My guess is either you haven't properly loaded svLine at all, or that the encoding was misidentified, and thus svLine contains something like "E\0K\0O\0..." or "䭅ⵏ" (that is "\x4b45\x2d4f" in C notation). Can you confirm with a message box or in the debugger?
One alternative you could consider is calling FileGrep. This could help if your code didn't load the file at all, but is unlikely to handle encodings any better. If it's improperly detected encoding and you can change the file, ensure the file has the correct BOM for its encoding. But if you don't control the file, I'm not sure what to recommend. Binary reads and manual decoding (possibly leveraging Kernel32.MultiByteToWideChar) might be your best bet.

Checking for EOF using shell script

I have a project that involves extracting data from a database into a text file, and then ingesting it into Hadoop. So i want to create a shell script that NiFi can run to automatically to check if a text file is extracted and ingest it, but I need to make sure that the whole data has been extracted first before ingesting it. Meaning I would need to check that the text file has an EOF, how do I do that?
Don't have any code as of yet, I have very little knowledge writing shell scripts.
While creating the file, use a different name. Rename it to the expected name once the extraction is done. Then, the other process can start its work once the file exists.
EOF is not something that actually gets put in the text file - in fact, there isn't really any EOF value. EOF or end-of-file is a condition that occurs when you try to consume input from a source that has none to give.
There is no general marker you can look for in your text files that will tell you whether they are complete. You'll need to make your script indicate when a given chunk of data has been extracted in some other way. There are many possibilities; you could change the name of the file as choroba suggested, or you could create a lock file and remove it once the data extraction is done, or you could have your extraction program write a distinctive sequence of bytes to the file at the end, or so on.

Is it possible to insert a file into an exe?

I need to insert a generated file into an exe at the time of download. Currently, I create an "empty" file (filled with a repeating character) and package that with the exe. When it comes time to download, I look at the bytes for the installer, find the file by looking for the repeating character, and insert the generated file.
This process however is not working. The repeating character just does not show in the bytes. But I am certain the file is there as it is unpacked if I run the exe. Am I doing something wrong or is inserting a file into an exe even possible?
Also note that I am using Inno Setup Script v5.5.1 to compile the project into an exe.
If you want to change the contents of a file specified in a [Files] entry and compiled into the setup executable, then you must:
Make a dummy file that is at least as large as the largest content you will want to insert.
Fill the file (or at least the first 64 bytes or so) with something unique and easily distinguishable.
Mark its [Files] entry with the "nocompression noencryption dontverifychecksum" flags.
You should then be able to scan the resulting executable for the marker in #2 and then substitute the data that you want. Note however that doing this might invalidate any digital signature on the setup file, although I haven't tested this to be sure.
Note that if the content you are inserting is smaller than the dummy file size, the extra bytes will still remain on the end of your inserted content. So whatever reads the file will have to have some way to ignore that or to recognise the end of the interesting content.
So, if your are making changes in the existing exe file, and if the text is not much, you can probably use some hex editor and make changes at desired location. If text is more , you might want to include some meaningless bytes, just as fillers.

I want to change the way text is represented internally in ANY Text Editor

I want to use a algorithm to reduce memory used to save the particular text file.I don't really know how text is stored but i have an idea in mind.
Would it be better to extend a open source text editor (if yes than which one) or write a text editor myself.
It would be nice if someone could also give me a link or tutorial to some basics on how text editors work and the way data is stored.
Edited to add
To clarify, what I wanted to do is instead of saving duplicates of a word make a hash table and store the address where it needs to be placed.
That way I wouldn't be storing the duplicates.
This would have become specific to a particular text editor.
Update
thanks everyone I got what all of you'll are trying to say. Anyways all i wanted to do is instead of saving duplicates of a word make a hash table and store the address where it needs to be placed.
This was i wouldn't be storing the duplicates.
Yes and this would have become specific to a particular text editor. never realized that.
I want to use a algorithm to reduce memory used to save the particular text file
If you did this you would no longer have a text editor, but instead you would have created some sort of binary file editor.
The whole point of the text file format is that it is universal, meaning any text file can be open in any other text editor.
Emacs handles compression transparently. Just create a text file with .gz extension. Emacs will automatically compress contents of the file during save operation, and decompress when you open the file next time.
Text is basically stored as-is. i.e., every character takes up a byte or two (wide chars), and there is no conversion done on it when it's saved. It might add an end-of-file character or something though. Don't try coming up with your own algorithm to compress these files. That's why zip-files and other archives were created. They're really good at compressing text. If you wanted to add these feature to your text-editor, you'd have to add some sort of post-save hook to zip it, and then put a hook on the open command to unzip it. Unless you wanted to do it by hand every time. Don't try writing the text editor yourself from scratch, unless (maybe) you're writing notepad. Text editors with syntax highlighting aren't very easy to make, even with the proper libraries. I'd say write a plugin for something like Visual Studio or what have you. Or find an open-source text editor.

Resources