Removing the first word in each line in python - python-3.x

My Text file currently looks like this:
1 1.094141 -19.991062 -0.830169
2 0.506693 -19.613609 -2.876364
3 -0.355470 -18.932575 -4.884786
4 -0.354663 -27.707542 -21.295307
5 1.008405 -18.191206 -4.542386
6 2.663746 -19.178164 -5.195459
10 0.245458 -17.983212 -2.999652
11 1.411953 -20.360981 -4.684113
I need a program to remove the first character from each line to make it look like:
1.094141 -19.991062 -0.830169
0.506693 -19.613609 -2.876364
-0.355470 -18.932575 -4.884786
-0.354663 -27.707542 -21.295307
1.008405 -18.191206 -4.542386
2.663746 -19.178164 -5.195459
0.245458 -17.983212 -2.999652
1.411953 -20.360981 -4.684113
How do I do this in Python? I have more than 200 files with a similar data and I need to delete the first character. Please help me with the code. Thank you! :)
Well, I am also trying to do other things but I want to fix the logic in this code of mine.
import numpy as np
with open('test2.txt') as f1:
lines = f1.readlines()
with open('test2.txt') as infile:
with open('Output.txt', 'a') as outfile:
outfile.write('# vtk Datafile Version 3.0 \n')
outfile.write('Unstructured Grid.. \n')
outfile.write('ASCII\n')
copy = False
for line in infile:
if line.strip() == "651734":
copy = True
elif line.strip() == "$EndNodes":
copy = False
elif line.strip() == "3089987":
copy = True
elif copy:
outfile.write(line)

The following lines will split the lines you're fed to the lines variable on line 4 of your code, and remove the word that comes before the first space.
for line, i in enumerate(lines):
lines[i] = line.split(" ", 1)[1]
Keep in mind that this will only work if your line always follows the layout you outlined above.
Read up on how to use split properly here
and, of course, study the python documents again carefully.
Having said that, it also looks like the second with open(test2.txt) is superfluous; you have stored the lines of that file in your lines variable on line 4 already, so right there you're just wasting space and memory.
You should probably sketch out your idea again, before you continue writing your program. Right now it's quite redundant and not very well thought through.

The above code is almost correct but not exactly accurate. The code tries to use "line" as an iterator instead of "i."
for i, line in enumerate(lines):
lines[i] = line.split(" ", 1)[1]

Related

Problem with reading text then put the text to the list and sort them in the proper way

Open the file romeo.txt and read it line by line. For each line, split the line into a list of words using the split() method. The program should build a list of words. For each word on each line check to see if the word is already in the list and if not append it to the list. When the program completes, sort and print the resulting words in alphabetical order.
This is the question my problem is I cannot write a proper code and gathering true data, always my code gives me 4 different lists for each raw!
** This is my code**
fname = input("Enter file name: ")
fh = open(fname)
lst = list()
for line in fh:
line=line.rstrip()
line =line.split()
if line in last:
print(true)
else:
lst.append(line)
print(lst)
*** the text is here, please copy and paste in text editor***
But soft what light through yonder window breaks
It is the east and Juliet is the sun
Arise fair sun and kill the envious moon
Who is already sick and pale with grief
You are not checking the presence of individual words in the list, but rather the presence of the entire list of words in that line.
With some modifications, you can achieve what you are trying to do this way:
fname = input("Enter file name: ")
fh = open(fname)
lst = list()
for line in fh:
line = line.rstrip()
words = line.split()
for word in words:
if word not in lst:
lst.append(word)
print(lst)
However, a few things I would like to point out looking at your code:
Why are you using rstrip() instead of strip()?
It is better to use list = [] as opposed to your lst = list(). It is shorter, faster, more Pythonic and avoids the use of this confusing lst variable.
You should want to remove punctuation marks attached to words, eg: ,.: which do not get removed by split()
If you want a loop body to not do anything, use pass. Why are you printing true? Also, in Python, it's True and not true.

How can I expand List capacity in Python?

read = open('700kLine.txt')
# use readline() to read the first line
line = read.readline()
aList = []
for line in read:
try:
num = int(line.strip())
aList.append(num)
except:
print ("Not a number in line " + line)
read.close()
print(aList)
There is 700k Line in that file (every single line has max 2 digits number)
I can only get ~280k Line in that file to in my aList.
So, How can I expand aList capacity 280k to 700k or more? (Is there a different solution for this case?)
Hello, I just solved that problem. Thanks for all your helps. That was an obvious buffer problem.
Solution is just increasing the size of buffer.
link is here
Increase output buffer when running or debugging in PyCharm
Please try this.
filename = '700kLine.txt'
with open(filename) as f:
data = f.readlines()
print(data)
print(type(data)) #stores the data in a list
Yes, you can.
Once a list is defined, you can add, edit or delete its elements. To add more elements at the end, use the append function:
MyList.append(data)
Where MyList is the name of the list and data is the element you want to add.
I tried to re-create your problem:
# creating 700kLine file
with open('700kLine.txt', 'w') as f:
for i in range(700000):
f.write(str(i+1) + '\n')
# creating list from file entries
aList = []
with open('700kLine.txt', 'r') as f:
for line in f:
num = int(line.strip())
aList.append(num)
# print(aList)
print(aList[:30])
Jupyter notebook throws an error while printing all 700K lines due to too much memory used. If you really want to print all 700k values, run the python script from terminal.
It could be that your computer ran out of memory processing the file? I have tried generating an infinite loop appending a single digit to the list and I ended up with 47 million-ish len(list) >> 47119572, the code I use to test as below.
I tried this code on an online REPL and it came to a significantly lower 'len(list)`.
list = []
while True:
try:
if len(list) > 0:
list.append(list[-1] + 1)
else:
list.append(1)
except MemoryError:
print("memory error, last count is: ", list[-1])
raise MemoryError
Maybe try saving bits of data read instead of reading the whole file at once?
Just my assumption.

How to add the line number at the beginning of each line in a file

So.. I need to read a file and add the line number at the beginning of each line. Just as the title. How do you do it?
For example, if the content of the file was:
This
is
a
simple
test
file
These 6 lines, I should turn it into
1. This
2. is
3. a
4. simple
5. test
6. file
Keep the original content, but just adding the line number at the beginning.
My code looks like this so far:
def add_numbers(filename):
f = open(filename, "w+")
line_number = 1
for line in f.readlines():
number_added = str(line_number) + '. ' + f.readline(line)
line_number += 1
return number_added
But it doesn't really show anything as the result. I have no clues how to do it. Any help?
A few problems I see in your code:
You indentation is not correct. Everything below the def add_numbers(): should be indented one level.
It is good practice to close a file handle at the end of your method.
A similar question to yours was asked here. Looking at the various solutions posted there, using fileinput seems like your best bet because it allows you to edit your file in-place.
import fileinput
def add_numbers(filename):
line_number = 1
for line in fileinput.input(filename, inplace=True):
print("{}. {}".format(line_number, line))
line_number += 1
Also note that I use format to combine two strings instead adding them together, because this handles different variable types more easily. A good explanation of the use of format can be found here.

Python code to read first 14 characters, uniquefy based on them, and parse duplicates

I have a list of more than 10k os string that look like different versions of this (HN5ML6A02FL4UI_3 [14 numbers or letters_1-6]), where some are duplicates except for the _1 to _6.
I am trying to find a way to list these and remove the duplicate 14 character (that comes before the _1-_6).
Example of part of the list:
HN5ML6A02FL4UI_3
HN5ML6A02FL4UI_1
HN5ML6A01BDVDN_6
HN5ML6A01BDVDN_1
HN5ML6A02GVTSV_3
HN5ML6A01CUDA2_1
HN5ML6A01CUDA2_5
HN5ML6A02JPGQ9_5
HN5ML6A02JI8VU_1
HN5ML6A01AJOJU_5
I have tried versions of scripts using Reg Expressions: var n = /\d+/.exec(info)[0]; into the following that were posted into my previous question. and
I also used a modified version of the code from : How can I strip the first 14 characters in an list element using python?
More recently I used this script and I am still not getting the correct output.
import os, re
def trunclist('rhodopsins_play', 'hope4'):
with open('rhodopsins_play','r') as f:
newlist=[]
trunclist=[]
for line in f:
if line.strip().split('_')[0] not in trunclist:
newlist.append(line)
trunclist.append(line.split('_')[0])
print newlist, trunclist
# write newlist to file, with carriage returns
with open('hope4','w') as out:
for line in newlist:
out.write(line)
My inputfile.txt contains more than 10k of data which looks like the list above, where the important part are the characters are in front of the '_' (underscore), then outputting a file of the uniquified ABCD12356_1.
Can someone help?
Thank you for your help
Import python and run this script that is similar to the above. It is slitting at the '_' This worked on the file
def trunclist(inputfile, outputfile):
with open(inputfile,'r') as f:
newlist=[]
trunclist=[]
for line in f:
if line.strip().split('_')[0] not in trunclist:
newlist.append(line)
trunclist.append(line.split('_')[0])
print newlist, trunclist
# write newlist to file, with carriage returns
with open(outputfile,'w') as out:
for line in newlist:
out.write(line)

python3 opening files and reading lines

Can you explain what is going on in this code? I don't seem to understand
how you can open the file and read it line by line instead of all of the sentences at the same time in a for loop. Thanks
Let's say I have these sentences in a document file:
cat:dog:mice
cat1:dog1:mice1
cat2:dog2:mice2
cat3:dog3:mice3
Here is the code:
from sys import argv
filename = input("Please enter the name of a file: ")
f = open(filename,'r')
d1ct = dict()
print("Number of times each animal visited each station:")
print("Animal Id Station 1 Station 2")
for line in f:
if '\n' == line[-1]:
line = line[:-1]
(AnimalId, Timestamp, StationId,) = line.split(':')
key = (AnimalId,StationId,)
if key not in d1ct:
d1ct[key] = 0
d1ct[key] += 1
The magic is at:
for line in f:
if '\n' == line[-1]:
line = line[:-1]
Python file objects are special in that they can be iterated over in a for loop. On each iteration, it retrieves the next line of the file. Because it includes the last character in the line, which could be a newline, it's often useful to check and remove the last character.
As Moshe wrote, open file objects can be iterated. Only, they are not of the file type in Python 3.x (as they were in Python 2.x). If the file object is opened in text mode, then the unit of iteration is one text line including the \n.
You can use line = line.rstrip() to remove the \n plus the trailing withespaces.
If you want to read the content of the file at once (into a multiline string), you can use content = f.read().
There is a minor bug in the code. The open file should always be closed. I means to use f.close() after the for loop. Or you can wrap the open to the newer with construct that will close the file for you -- I suggest to get used to the later approach.

Resources