Python code to read first 14 characters, uniquefy based on them, and parse duplicates - string

I have a list of more than 10k os string that look like different versions of this (HN5ML6A02FL4UI_3 [14 numbers or letters_1-6]), where some are duplicates except for the _1 to _6.
I am trying to find a way to list these and remove the duplicate 14 character (that comes before the _1-_6).
Example of part of the list:
HN5ML6A02FL4UI_3
HN5ML6A02FL4UI_1
HN5ML6A01BDVDN_6
HN5ML6A01BDVDN_1
HN5ML6A02GVTSV_3
HN5ML6A01CUDA2_1
HN5ML6A01CUDA2_5
HN5ML6A02JPGQ9_5
HN5ML6A02JI8VU_1
HN5ML6A01AJOJU_5
I have tried versions of scripts using Reg Expressions: var n = /\d+/.exec(info)[0]; into the following that were posted into my previous question. and
I also used a modified version of the code from : How can I strip the first 14 characters in an list element using python?
More recently I used this script and I am still not getting the correct output.
import os, re
def trunclist('rhodopsins_play', 'hope4'):
with open('rhodopsins_play','r') as f:
newlist=[]
trunclist=[]
for line in f:
if line.strip().split('_')[0] not in trunclist:
newlist.append(line)
trunclist.append(line.split('_')[0])
print newlist, trunclist
# write newlist to file, with carriage returns
with open('hope4','w') as out:
for line in newlist:
out.write(line)
My inputfile.txt contains more than 10k of data which looks like the list above, where the important part are the characters are in front of the '_' (underscore), then outputting a file of the uniquified ABCD12356_1.
Can someone help?
Thank you for your help

Import python and run this script that is similar to the above. It is slitting at the '_' This worked on the file
def trunclist(inputfile, outputfile):
with open(inputfile,'r') as f:
newlist=[]
trunclist=[]
for line in f:
if line.strip().split('_')[0] not in trunclist:
newlist.append(line)
trunclist.append(line.split('_')[0])
print newlist, trunclist
# write newlist to file, with carriage returns
with open(outputfile,'w') as out:
for line in newlist:
out.write(line)

Related

Remove that extra line (called a newline) when printing in python3

I'm new bee in python 3 and stuck here to remove \n while compiling code as given below, want to return two random lines with out printing \n and w/o square bracket [ ], what should i do?
code is
import random
def head():
f = open("quotes.txt")
quotes = f.readlines()
f.close()
last=18
print(random.sample(quotes,2))
if __name__== "__main__":
head()
And executed this file the result returned as selected two random lines it is fine for me, but in the format like this included \n
['IMPOSSIBLE says itself I M POSSIBLE\n', 'Never stops to Learning till dead end\n']
You are getting results like ['IMPOSSIBLE says itself I M POSSIBLE\n', 'Never stops to Learning till dead end\n'] is because it is list and you directly printing list as it is.
Solution
Remove print(random.sample(quotes,2)) and add following code
tmp = random.sample(quotes,2)
for i in tmp:
print(i,end="")
This will solve your problem and end in print is because your quotes already has newline so we are preventing print from inserting extra \n.
It's resolved!!!
I ran the code by typing command python which it was taken as python 2.7 and returned as this type of junk result, but it works fine as executed with python3 command.

Converting string to dictionary from a opened file

A text file contains dictionary as below
{
"A":"AB","B":"BA"
}
Below are code of python file
with open('devices_file') as d:
print (d["A"])
Result should print AB.
As #rassar and #Ivrf suggested in comments you can use ast.literal_eval() as well as json.loads() to achieve this. Both code snippets outputs AB.
Solution with ast.literal_eval():
import ast
with open("devices_file", "r") as d:
content = d.read()
result = ast.literal_eval(content)
print(result["A"])
Solution with json.loads():
import json
with open("devices_file") as d:
content = json.load(d)
print(content["A"])
Python documentation about ast.eval_literal() and json.load().
Also: I noticed that you're not using the correct syntax in the code snippet in your question. Indented lines should be indented with 4 spaces, and between the print keyword and the associated parentheses there's no whitespace allowed.

IndexError: list index out of range, but list length OK

New to programming, looking for a deeper understanding on whats happening.
Goal: open a file and print the first 10 lines. (similar to head command)
Code:
with open('file') as f:
for i in range(0,10):
print([line.strip('\n') for line in f][i])
Result: prints first line fine, then returns the out of range error
File: Is a simple text file with 20 lines, no more than 50 chars per line
FYI - Removed range line and printed both type(list) and length(20). Printed specific indexes without issue (unless >1 in a row)
Able to get the desired result with different code, but trying to improve using with/as
You can actually iterate over a file. Which is what you should be doing here.
with open('file') as f:
for i, line in enumerate(file, start=1):
# Get out of the loop if we hit 10 lines
if i >= 10:
break
# Line already has a '\n' at the end
print(line, end='')
The reason that your code is failing is because of your list comprehension:
[line.strip('\n') for line in f]
The first time through your loop that consumes all of the lines in your file. Now your file has no more lines, so the next time through it creates a list of all the lines in your file and tries to get the [1]st element. But that doesn't exist because there are no lines at the end of your file.
If you wanted to keep your code mostly as-is you could do
lines = [line.rstrip('\n') for line in f]
for i in range(10):
print(lines[i])
But that's also silly, because you could just do
lines = f.readlines()
But that's also silly if you just want up to the 10th line, because you could do this:
with open('file') as f:
print('\n'.join(f.readlines()[:10]))
Some further explanation:
The shortest and worst way you could fix your code is by adding one line of code:
with open('file') as f:
for i in range(0,10):
f.seek(0) # Add this line
print([line.strip('\n') for line in f][i])
Now your code will work - but this is a horrible way to get your code to work. The reason that your code isn't working the way you expect in the first place is that files are consumable iterators. That means that when you read from them eventually you run out of things to read. Here's a simple example:
import io
file = io.StringIO('''
This is is a file
It has some lines
okay, only three.
'''.strip())
for line in file:
print(file.tell(), repr(line))
This outputs
18 'This is is a file\n'
36 'It has some lines\n'
53 'okay, only three.'
Now if you try to read from the file:
print(file.read())
You'll see that it doesn't output anything. That's because you've "consumed" the file. I mean obviously it's still on disk, but the iterator has reached the end of the file. But as shown, you can seek in the file.
print(file.tell())
file.seek(0)
print(file.tell())
print(file.read())
And you'll see your entire file printed. But what about those other positions?
file.seek(36)
print(file.read()) # => okay, only three.
As a side note, you can also specify how much to read:
file.seek(36)
print(file.read(4)) # => okay
print(file.tell()) # => 40
So when we read from a file or iterate over it we consume the iterator and get to the end of the file. Let's put your new tools to work and go back to your original code and explore what's happening.
with open('file') as f:
print(f.tell())
lines = [line.rstrip('\n') for line in f]
print(f.tell())
print(len([line for line in f]))
print(lines)
You'll see that you're at a different location in the file. And the second list comprehension produces an empty list. That's because when a list comprehension is evaluated it executes immediately. So when you do this:
for i in range(10):
print([line.strip('\n') for line in f][i])
What you're doing the first time, i = 0 and then the list comprehension reads to the end of the file. Now it takes the [0]th element of the list, or the first line in the file. But your file iterator is at the end of the file.
So now we get back to the beginning of the list and i = 1. Now we iterate to the end of the file, but we're already at the end so there are no lines to read, and we've got an empty list [] that we try to get the [0]th element of. But there's nothing there. So we get an IndexError.
List comprehensions can be useful, but when you're beginning it's usually much easier to write a for loop and then turn it into a list comprehension. So you might write something like this:
with open('file') as f:
for i, line in enumerate(file, start=10):
if i < 10:
print(line.rstrip())
Now, we shouldn't print inside a list comprehension, so instead we'll collect everything. We start out by putting what we want:
[line.rstrip()
Now add the for bit:
[line.rstrip() for i, line in enumerate(f)
And finally add the filter and our closing brace:
[line.rstrip() for i, line in enumerate(f) if i < 10]
For more on list comprehensions, this is a fantastic resource: http://treyhunner.com/2015/12/python-list-comprehensions-now-in-color/

How to print multiple lines from a file python

I'm trying to print several lines from a text file onto python, where it is outputted. My current code is:
f = open("sample.txt", "r").readlines()[2 ,3]
print(f)
However i'm getting the error message of:
TypeError: list indices must be integers, not tuple
Is there anyway of fixing this or printing multiple lines from a file without printing them out individually?
You are trying to pass a tuple to the [...] subscription operation; 2 ,3 is a tuple of two elements:
>>> 2 ,3
(2, 3)
You have a few options here:
Use slicing to take a sublist from all the lines. [2:4] slices from the 3rd line and includes the 4th line:
f = open("sample.txt", "r").readlines()[2:4]
Store the lines and print specific indices, one by one:
f = open("sample.txt", "r").readlines()
print f[2].rstrip()
print f[3].rstrip()
I used str.rstrip() to remove the newline that's still part of the line before printing.
Use itertools.islice() and use the file object as an iterable; this is the most efficient method as no lines need to be stored in memory for more than just the printing work:
from itertools import islice
with open("sample.txt", "r") as f:
for line in islice(f, 2, 4):
print line.rstrip()
I also used the file object as a context manager to ensure it is closed again properly once the with block is done.
Assign the whole list of lines to a variable, and then print lines 2 and 3 separately.
with open("sample.txt", "r") as fin:
lines = fin.readlines()
print(lines[2])
print(lines[3])

Removing the first word in each line in python

My Text file currently looks like this:
1 1.094141 -19.991062 -0.830169
2 0.506693 -19.613609 -2.876364
3 -0.355470 -18.932575 -4.884786
4 -0.354663 -27.707542 -21.295307
5 1.008405 -18.191206 -4.542386
6 2.663746 -19.178164 -5.195459
10 0.245458 -17.983212 -2.999652
11 1.411953 -20.360981 -4.684113
I need a program to remove the first character from each line to make it look like:
1.094141 -19.991062 -0.830169
0.506693 -19.613609 -2.876364
-0.355470 -18.932575 -4.884786
-0.354663 -27.707542 -21.295307
1.008405 -18.191206 -4.542386
2.663746 -19.178164 -5.195459
0.245458 -17.983212 -2.999652
1.411953 -20.360981 -4.684113
How do I do this in Python? I have more than 200 files with a similar data and I need to delete the first character. Please help me with the code. Thank you! :)
Well, I am also trying to do other things but I want to fix the logic in this code of mine.
import numpy as np
with open('test2.txt') as f1:
lines = f1.readlines()
with open('test2.txt') as infile:
with open('Output.txt', 'a') as outfile:
outfile.write('# vtk Datafile Version 3.0 \n')
outfile.write('Unstructured Grid.. \n')
outfile.write('ASCII\n')
copy = False
for line in infile:
if line.strip() == "651734":
copy = True
elif line.strip() == "$EndNodes":
copy = False
elif line.strip() == "3089987":
copy = True
elif copy:
outfile.write(line)
The following lines will split the lines you're fed to the lines variable on line 4 of your code, and remove the word that comes before the first space.
for line, i in enumerate(lines):
lines[i] = line.split(" ", 1)[1]
Keep in mind that this will only work if your line always follows the layout you outlined above.
Read up on how to use split properly here
and, of course, study the python documents again carefully.
Having said that, it also looks like the second with open(test2.txt) is superfluous; you have stored the lines of that file in your lines variable on line 4 already, so right there you're just wasting space and memory.
You should probably sketch out your idea again, before you continue writing your program. Right now it's quite redundant and not very well thought through.
The above code is almost correct but not exactly accurate. The code tries to use "line" as an iterator instead of "i."
for i, line in enumerate(lines):
lines[i] = line.split(" ", 1)[1]

Resources