Python split method removing spaces....why? - python-3.x

I have this doing what I want it to (Take a file, shuffle the middle letters of the words and rejoin them), but for some reason, the spaces are being removed even though I'm asking it to split on spaces. Why is that?
import random
File_input= str(input("Enter file name here:"))
text_file=None
try:
text_file = open(File_input)
except FileNotFoundError:
print ("Please check file name.")
if text_file:
for line in text_file:
for word in line.split(' '):
words=list (word)
Internal = words[1:-1]
random.shuffle(Internal)
words[1:-1]=Internal
Shuffled=' '.join(words)
print (Shuffled, end='')

If you want the delimiter as part of the values:
d = " " #delim
line = "This is a test" #string to split, would be `line` for you
words = [e+d for e in line.split(d) if e != ""]
What this does is split the string, but return the split value plus the delimiter used. Result is still a list, in this case ['This ', 'is ', 'a ', 'test '].
If you want the delimiter as part of the resultant list, instead of using the regular str.split(), you can use re.split(). The docs note:
re.split(pattern, string[, maxsplit=0, flags=0])
Split string by the
occurrences of pattern. If capturing parentheses are used in pattern,
then the text of all groups in the pattern are also returned as part
of the resulting list.
So, you could use:
import re
re.split("( )", "This is a test")
And result:
['this', ' ', 'is', ' ', 'a', ' ', 'test']

Related

Why I can not add datas into lists

with open("lineup.txt", "r", encoding="utf-8") as file:
ts = list()
fb = list()
for line in file:
line = line[:-1]
data = line.split(",")
if (data[1] == "Fenerbahçe"):
fb.append(line + "\n")
elif (data[1] == "Trabzonspor"):
ts.append(line + "\n")
with open("ts.txt", "w", encoding="utf-8") as file1:
for i in ts:
file1.write(i)
with open("fb.txt", "w", encoding="utf-8") as file2:
for i in fb:
file2.write(i)
print(fb)
print(ts)
And here is some datas from lineup.txt file
U. Çakır, Trabozonspor
Marc Bartra, Trabzonspor
İ. Kahveci, Fenerbahçe
S. Aziz, Fenerbahçe
Trezeguet, Trabzonspor
A. Bayındır, Fenerbahçe
Gustavo Henrique, Fenerbahçe
I am taking ∅ in both lists so I can not write datas into txt. I can't figure it out why
There may be an issue with your split function. If you split on comma, your second word of your list will start with a space. And if that is not your only word after the comma and space, you won't be able to match the equality anyways.
One way to solve this, is by using the "in" operator, avoiding the "split" completely.
for line in file:
if ("Fenerbahçe" in line):
fb.append(line[:-1] + "\n")
elif ("Trabzonspor" in line):
ts.append(line[:-1] + "\n")
If the matching is more complex than that, you may consider using regex and exploit word boundaries before and after your matching city names.

How can I find all the strings that contains "/1" and remove from a file using Python?

I have this file that contains these kinds of strings "1405079/1" the only common in them is the "/1" at the end. I want to be able to find those strings and remove them, below is sample code
but it's not doing anything.
with open("jobstat.txt","r") as jobstat:
with open("runjob_output.txt", "w") as runjob_output:
for line in jobstat:
string_to_replace = ' */1'
line = line.replace(string_to_replace, " ")
with open("jobstat.txt","r") as jobstat:
with open("runjob_output.txt", "w") as runjob_output:
for line in jobstat:
string_to_replace ='/1'
line =line.rstrip(string_to_replace)
print(line)
Anytime you have a "pattern" you want to match against, use a regular expression. The pattern here, given the information you've provided, is a string with an arbitrary number of digits followed by /1.
You can use re.sub to match against that pattern, and replace instances of it with another string.
import re
original_string= "some random text with 123456/1, and midd42142/1le of words"
pattern = r"\d*\/1"
replacement = ""
re.sub(pattern, replacement, original_string)
Output:
'some random text with , and middle of words'
Replacing instances of the pattern with something else:
>>> re.sub(pattern, "foo", original_string)
'some random text with foo, and middfoole of words'

How to complete this Python script to manipulate data in tab delimited file?

I have a list of Part Numbers and Serial Numbers in a tab-delimited file that I need to merge together using a hyphen to make an Asset Number.
This is the input:
Part Number Serial Number
PART1 SERIAL1
,PART2 SERIAL2
, PART3 SERIAL3
This is what I would like as the desired output:
Part Number Serial Number Asset Number
PART1 SERIAL1 PART1-SERIAL1
,PART2 SERIAL2 PART2-SERIAL2
, PART3 SERIAL3 PART3-SERIAL3
I have tried the following code:
import csv
input_list = []
with open('Assets.txt', mode='r') as input:
for row in input:
field = row.strip().split('\t') #Remove new lines and split at tabs
for x, i in enumerate(field):
if i[0] == (','): #If the start of a field starts with a comma
field[x][0] = ('') #Replace that first character with nothing
field[x].lstrip() #Strip any whitespace
print(field)
This code produced the actual output:
['Part Number', 'Serial Number']
['PART1', 'SERIAL1']
['",PART2"', 'SERIAL2']
['", PART3"', 'SERIAL3']
My first problem is that my code to remove the commas and whitespace from the start of all fields fails to work.
The second problem is that there are quotation marks that have been added to the whitespaces.
The third problem is that I don't know how to add another item to the list array (Asset Numbers) so I can join the fields.
Would someone please be able to help me solve any of these problems?
You can try to strip the commas even if they are not here without problem, so the if[0] == ",": is not needed anymore. You also strip a string but the value is not stored in the list. This is fixed here:
input_list = []
with open('Assets.txt', mode='r') as text_file:
for row in text_file:
field = row.strip('\n').split('\t') # Remove new lines and split at tabs.
for n, word in enumerate(field):
field[n] = word.lstrip(", ") # Strip any number of whitespaces and commas.
print(field)
Output:
['Part Number', 'Serial Number']
['PART1', 'SERIAL1']
['PART2', 'SERIAL2']
['PART3', 'SERIAL3']
So now we can put a Asset_number = field[0] + '-' + field[1] somewhere and it will give you the value PARTx-SERIALx that you wanted to use.
A little modification to get the desired output:
input_list = []
with open('Assets.txt', mode='r') as text_file:
for m, row in enumerate(text_file):
field = row.strip('\n').split('\t') # Remove new lines and split at tabs.
for n, word in enumerate(field):
field[n] = word.lstrip(", ") # Strip any number of whitespaces and commas.
if m == 0: # Special case for the header.
text_to_print = field[0] + '\t' + field[1] + '\t' + 'Asset Number'
else:
Asset_number = field[0] + '-' + field[1]
text_to_print = field[0] + '\t' + field[1] + '\t' + Asset_number
print(text_to_print)
And the printed output is:
Part Number Serial Number Asset Number
PART1 SERIAL1 PART1-SERIAL1
PART2 SERIAL2 PART2-SERIAL2
PART3 SERIAL3 PART3-SERIAL3
It does not look good here for some reason but the string is still right and the tabs are where they are expected, so you should have no problem writing that to a new file instead of printing it.
'Part Number\tSerial Number\tAsset Number'
'PART1\tSERIAL1\tPART1-SERIAL1'
'PART2\tSERIAL2\tPART2-SERIAL2'
'PART3\tSERIAL3\tPART3-SERIAL3'
import pandas as pd
data = {'Part Number': ['PART1',', PART2',', PART3'],
'Serial Number': ['Serial1','Serial2','Serial3']}
df = pd.DataFrame(data)
df.loc[:,'AssetNumber'] = df.loc[:,'Part Number'].apply(lambda x: str(x).strip().replace(',','')) + '-' + df.loc[:,'Serial Number'].apply(lambda x: str(x).strip().replace(',',''))
This will do what you want
In your case as you are dealing with CSV call
df = pd.read_csv('filepathasstring',sep='\t')
If you have an issue check this one for issue with rows:
Reading tab-delimited file with Pandas - works on Windows, but not on Mac
Then you can save as tab delimited by calling:
df.to_csv('filepathasstring', sep='\t')
And here's how to get pandas if you don't have it yet:
https://pandas.pydata.org/pandas-docs/stable/install.html
You can try below code and it perfectly works.
input.txt
Part Number Serial Number
PART1 SERIAL1
,PART2 SERIAL2
, PART3 SERIAL3
split_text_add_combine.py
import re
def split_and_combine(in_path, out_path, new_column_name):
format_string = "{0:20s}{1:20s}{2:20s}"
new_lines = [] # To store new lines
# Reading input file to process
with open(in_path) as f:
lines = f.readlines()
for index, line in enumerate(lines):
line = line.strip()
arr = re.split(r"\s{2,}", line)
if index == 0:
# Important to split words in case if words have more than single space
new_line = format_string.format(arr[0], arr[1], new_column_name) + '\n'
else:
# arr = line.split()
comma_removed_string = (arr[0] + "-" + arr[1]).lstrip(",").lstrip()
new_line = format_string.format(arr[0], arr[1], comma_removed_string) + '\n'
new_lines.append(new_line)
print(new_lines)
# Writing new lines to: output.txt
with open(out_path, "w") as f:
f.writelines(new_lines)
if __name__ == "__main__":
in_path = "input.txt"
out_path = "output.txt"
new_column_name = "Asset Number"
split_and_combine(in_path, out_path, new_column_name)
output.txt
Part Number Serial Number Asset Number
PART1 SERIAL1 PART1-SERIAL1
,PART2 SERIAL2 PART2-SERIAL2
, PART3 SERIAL3 PART3-SERIAL3
References:
https://www.programiz.com/python-programming/methods/string/format
https://www.programiz.com/python-programming/methods/string/strip

print complete line which includes a search word. the eol is a dot not a line feed

I have a long text (winter's tale). Now I want search for the word 'Luzifer' and than the complete line, which includes the word 'Luzifer' should be printed. With complete line I means all between2 dots.
My scrip is printing 'Luzifer' and all following words til end of line dot. But I want have the full line.
For example. the text line is:
'Today Luzifer has a bad day. And he is ill'
My scrip is printing: 'Luzifer has a bad day.'
But I need the complete line inclusive today.
Is there a function or way to rad back ?
Here my script:
#!/usr/bin/python3.6
# coding: utf-8
import re
def suchen(regAusdruck, textdatei):
f = open(textdatei, 'r', encoding='utf-8')
rfctext = f.read()
f.close()
return re.findall(regAusdruck, rfctext)
pattern1 = r'\bLuzifer\b[^.;:!?]{2,}'
print(suchen(pattern1, "tale.txt"))
One of the most straightforward ways of handling this is to read in your entire text (hopefully it is not too big), split on '.', and then return the strings that contain your search word. For good measure, I think it will be useful to replace the newline characters with a space so that you don't have any strings broken into multiple lines.
def suchen(regAusdruck, textdatei):
with open(textdatei, 'r', encoding='utf-8') as f:
entire_text = f.read()
entire_text = entire_text.replace('\n', ' ') # replace newlines with space
sentences = entire_text.split('.')
return [sentence for sentence in sentences if regAusdruck in sentence]
# Alternatively...
# return list(filter(lambda x: regAusdruck in x, sentences))
print(suchen('Luzifer', "tale.txt"))
If you really need to use a regular expression (which may be the case for more complicated searches) a modification is only needed in the return statement.
def suchen(regAusdruck, textdatei):
with open(textdatei, 'r', encoding='utf-8') as f:
entire_text = f.read()
entire_text = entire_text.replace('\n', ' ') # replace newlines with space
sentences = entire_text.split('.')
# We assume you passed in a compiled regular expression object.
return [sentence for sentence in sentences if regAusdruck.search(sentence)]
# Alternatively...
# return list(filter(regAusdruck.search, sentences))
import re
print(suchen(re.compile(r'\bluzifer\b', flags=re.IGNORECASE), "tale.txt"))

Opening a text file and adding to a dictionary

1) I'm looking to open a text file with values separated by colons like this:
Name : Daniel
Age : 12
Gender : Male
...
How do I open this text file in Python and add everything to a dictionary so that it ends up like this:
dictionary={"Name":"Daniel","Age":"12","Gender","male"...}
2) I then want the user to be able to search for a key, let's say "Name" and then the program outputs "Daniel". How can I do this?
A suggestion:
file = open("file.txt","r")
output_dict={}
lines=file.readlines()
file.close()
line=lines[0].replace(" : "," ")
words=line.split(' ')
for i in range(0,len(words)-1,2):
output_dict[words[i]]=words[i+1]
print(output_dict)
And what is the separator between lines?
If it is line break, you can do a file.readlines():
yourFile = open("file.txt", "r")
lines = yourFile.readlines()
output = {}
for line in lines:
#BE CAREFUL! [-2] if window line break (\r\n)
#You can also do l = line.replace('\r', "").replace("\n", "")
#Which is better, because it is cross-platform and cross-format
l = line[-1]
output[l.split(':')[0]] = l.split(':')[1]
Explanation:
yourFile.readlines() reads file, and return it like ["line1\n", "line2\n"]
We do a for, loop, we cut the line break character(s) (\n, \r\n or \r, depends of OS!! it should be only \n, but try...
for each lines:
We split string by column: "Name:Daniel".split(":") returns ['Name', 'Daniel']
We append it into the dictionary with dictionnary['key'] = 'value' syntax
It should work, but be careful: Spaces between column stay !!
To remove it out, you have to use string.replace ("Name : Daniel".replace(" ", "") will returns "Name:Daniel" ).
And for return the name, once you have dictionary, nothing simpler: dictionnary["Name"] outputs "Daniel".

Resources