opening a gzipped fil, characters following three pipes ("|||") are not visible - python-3.x

My input file is a gzipped file containing genomic information. I'm trying to parse the content on a line-by-line basis and have run into a strange problem.
Any given line looks something like this:
AC=26;AF=0.00519169;AN=5008;NS=2504;DP=17308;EAS_AF=0;AMR_AF=0.0072;AFR_AF=0.0015;EUR_AF=0.0109;SAS_AF=0.0082;AA=A|||;VT=SNP
However, when I print out what is being read in...
import gzip
with gzip.open(myfile.gz, 'rt') as f:
for line in f:
print(line)
The line looks like this:
AC=26;AF=0.00519169;AN=5008;NS=2504;DP=17308;EAS_AF=0;AMR_AF=0.0072;AFR_AF=0.0015;EUR_AF=0.0109;SAS_AF=0.0082;AA=A|||
Whatever information comes after the "|||" has been truncated.
Moreover, I can't even search the lines for strings that follow the "|||" (e.g. "VT=SNP" in line always returns False) I also can't line.strip("|||")
Any advice on what is causing this or what I need to look at?
Thank you for any help
EDIT: ok, it looks like there was something wrong with the gzip file. I uncompressed it and the script ran fine. Then I recompressed it and the script again ran fine (using gzip.open). Is there any straightforward way to compare the two compressed files (ie, the one that doesn't get read properly vs the one that works) so that I might get a hint at the root cause?

Related

.txt file is opening but prints nothing

I'm trying to open a text file and print it as a string. I've made sure there is text in the .txt file but when I run the code it just prints an empty space, I don't know what to do at this point since I couldn't find anything that could help me with my problem.
with open('test.txt', 'r') as file:
data = file.read().rstrip()
print(data)
When things aren't opening check the following:
You wrote exactly the same in your code as the one you saved. "file.txt" is not the same as "File.txt" for Python (same goes for accents and special characters).
The file you are trying to read is in the same directory. If your code is at users/bla/documents/another_folder and you just pass the name of the file to your code, then the file must be at users/bla/documents/another_folder too. If not, be shure to add it into the string path as "path/to/your/file/file.txt"
Make sure that the extension .txt is the same as your file.
If you checked that but everything seems correct, try:
with open(path_to_file) as f:
contents = f.readlines()
And see if "contents" has something.
I think it is better if you use open("file.txt","r") function to do it. So your code will be like this:
file=open("test.txt","r")
data=file.read().strip()
print(data)

python - handle strings in a file with some Japanese in it

I have a .c file that I want to open with python 3 to update a specific number on a specific line.
It seems like the most common way to do this would be to read the file in, write each line to a temporary file, when I get to the line I want, modify it, then write it to the temp file and keep going. Once I'm done, write the contents of the temp file back to the original file.
The problem that I have, is that in the comments of the file there are Japanese characters. I know I can still read it in by adding the error equal ignore argument, that allows me to still read the lines in but it gets rid of the Japanese characters completely and I need to preserve those.
I haven't been able to find a way how to do this. Is there any way to read in a file that's part in Japanese and part in English?

Why isn't 'for line in file' copying all lines in my text file?

I wrote some code to pull certain lines from a large text file and noticed some strange things missing, so I ran the following code to make sure the for loop was actually hitting every line in the file:
xf=open("bigFile.txt", r)
xxf=open("newFile.txt",w)
for line in xf:
xxf.write(line)
This ends up not copying all the lines for some reason. Could anyone tell me what I'm not understanding or doing wrong? It ends up only making a file about 60-70% as big as it should be? Any insight would be greatly appreciated.
EDIT: Thanks for the input skrrgwasme & Shreevardhan. To clarify, my ultimate goal is not just to copy the file, in my working code I put some comparison operators before writing the line, for example:
for line in xf:
firstChar=line[:1]
if firstChar==1:
xxf.write(line)
That is why I am using the "for line in file". Should I do this some other way?
To copy a file, it's better to use functions from shutil module like copyfile(), copy(), or copy2().
For example
from shutil import copyfile, copy2
copyfile('bigFile.txt', 'newFile.txt')
or
copy2('bigFile.txt', 'newFile.txt')
You need to close your file. There's no guarantee that buffers you're writing into are being flushed to disk before your script exits. You can do this very easily by using a context manager:
with open("bigFile.txt") as xf, open("newFile.txt", "w") as xxf:
for line in xf:
xxf.write(line)
In your current code, you would write xf.close() and xxf.close(), but using a context manager like this will handle it for you, and even close the files if an exception occurs.
Also, if you really are simply copying the file, you can also use shutil.copyfile().

Can gulp collect lines grep:ed into single output file?

I'm trying to filter out lines from all .js source files, and put into a separate file. (Specifically, I'm trying to grep all calls to a string translation function and post-process them).
I think I have the different parts figured out but can't make them fit together.
For each file, process it
Write each file's grep:ed lines to output.
Append the result to a file
I've tried to through.push(<output per file>) from the plugin, but the following step expects a file, not a string.
From there, I expect I could do something like gulp-concat or stream merge on the results and pipe it on to gulp.dist, but there's bit missing here.
I figured out a way - simply replace the Vinyl file's content with the lines to output, and push that through to through.push.

Piping SVG file into ImageMagick

Sorry if this belongs on serverfault
I'm wondering what the proper way is to use an SVG(xml) string as standard input
for a "convert msvg:- jpeg:- 2>&1" command (using linux)
Currently I'm just saving a temp file to use as input,
but the data originates from an API in my case, so feeding
the string directly to the command would obviously be most efficient.
I appreciate everyone's help. Thanks!
This should work:
convert - output.jpg
Example:
convert logo: logo.svg
cat logo.svg | convert - logo.jpg
Explanation:
The example's first line creates an SVN file and writes it to disk. This is only a preparatory stop so that we can run the second line.
The second line is a pipeline of two commands: cat streams the bytes of the file to stdout (standard output).
The first line served only as preparation for the next command in the pipeline, so that this next command has something to read in.
This next command is convert.
The - character is a way to tell convert to read its input data not from disk, but from stdin (standard input).
So convert reads its input data from its stdin and writes its JPEG output to the file logo.jpg.
So my first command/line is similar to your step described as 'currently I'm just saving a temp file to use as input'.
My second command/line does not use your API (I don't have access to it, do I?), but it demonstrates a different method to 'feeding a string directly to the command'.
So the most important lesson is this: Whereever convert would usually read input from a file and where you would write the file's name on the commandline, you can replace the filename by - to tell convert it should read from stdin. (But you need to make sure that there is actually something offered on convert's standard input which it can digest...)
Sorry, I can't explain better than this...

Resources