How to read a text file and insert the data into next line on getting \n character - python-3.x

I have a text file where data is comma delimited with a litral \n character in between, i would like to insert the data into newline just after getting the \n character.
text file sample:
'what,is,your,name\n','my,name,is,david.hough\n','i,am,a,software,prof\n','what,is,your,name\n','my,name,is,eric.knot\n','i,am,a,software,prof\n','what,is,your,name\n','my,name,is,fisher.cold\n','i,am,a,software,prof\n',..
expected:
I need the output in the below form.
'what,is,your,name',
'my,name,is,david.hough',
'i,am,a,software,prof',
Tried:
file1 = open("test.text", "r")
Lines = file1.readlines()
for line in Lines:
print(line)
result:
'what,is,your,name\n','my,name,is,david.hough\n','i,am,a,software,prof\n','what,is,your,name\n','my,name,is,eric.knot\n','i,am,a,software,prof\n','what,is,your,name\n','my,name,is,fisher.cold\n','i,am,a,software,prof\n',..

well my comment does exactly what you asked, break lines at \n. your data is structured quite weirdly, but if you want the expected result that badly you can use regex
import re
file1 = open("test.text","r")
Lines = re.findall(r'\'.*?\',',file1.read().replace("\\n",""))
for line in Lines:
print(line)

Well you don't need push data to the other line manually. The \n does that work when you run the code.
I guess the problem is that you used quotes very frequently, try using a single pair of quotes and use \n after the first sentence and yeah without white space
'what,is,your,name\nmy,name,is,david.hough\ni,am,a,software,prof'

Related

How to modify and print list items in python?

I am a beginner in python, working on a small logic, i have a text file with html links in it, line by line. I have to read each line of the file, and print the individual links with same prefix and suffix,
so that the model looks like this.
<item>LINK1</item>
<item>LINK2</item>
<item>LINK3</item>
and so on.
I have tried this code, but something is wrong in my approach,
def file_read(fname):
with open(fname) as f:
#Content_list is the list that contains the read lines.
content_list = f.readlines()
for i in content_list:
print(str("<item>") + i + str("</item>"))
file_read(r"C:\Users\mandy\Desktop\gd.txt")
In the output, the suffix was not as expected, as i am a beginner, can anyone sort this out for me?
<item>www.google.com
</item>
<item>www.bing.com
</item>
I think when you use .readLine you also put the end of line character into i.
If i understand you correctly and you want to print
item www.google.com item
Then try
https://www.journaldev.com/23625/python-trim-string-rstrip-lstrip-strip
print(str("") + i.strip() + str(""))
When you use the readlines() method, it also includes the newline character from your file ("\n") before parsing the next line.
You could use a method called .strip() which strips off spaces or newline characters from the beginning and end of each line which would correctly format your code.
def file_read(fname):
with open(fname) as f:
#Content_list is the list that contains the read lines.
content_list = f.readlines()
for i in content_list:
print(str("<item>") + i.strip() + str("</item>"))
file_read(r"C:\Users\mandy\Desktop\gd.txt")
I assume you wanted to print in the following way
www.google.com
When you use readlines it gives extra '\n' at end of each line. to avoid that you can strip the string and in printing you can use fstrings.
with open(fname) as f:
lin=f.readlines()
for i in lin:
print(f"<item>{i.strip()}<item>")
Another method:
with open('stacksource') as f:
lin=f.read().splitlines()
for i in lin:
print(f"<item>{i}<item>")
Here splitlines() splits the lines and gives a list

Splitting File contents by regex

As the title stated, I need to split a file by regex in Python.
The lay out of the .txt file is as follows
[text1]
file contents I need
[some text2]
more file contents I need
[more text 3]
last bit of file contents I need
I originally tried splitting the files like so:
re.split('\[[A-Za-z]+\]\n', data)
The problem with doing it this way was that it wouldn't capture the blocks that had spaces in between the text within the brackets.
I then tried using a wild card character: re.split('\[(.*?)\]\n', data)
The problem I ran into this was that I found it would split the file contents as well. What's the best way to to get the following result:
['file contents I need','more file contents I need','last bit of file contents I need']?
Thanks in advance.
Instead of using re.split, you could use a capturing group with re.findall which will return the group 1 values.
In the group, match all the lines that do not start with the [.....] pattern
^\[[^][]*]\r?\n\s*(.*(?:\r?\n(?!\[[^][]*]).*)*)
In parts
^ Start of line
\[[^][]*]
\r?\n\s* Match an newline and optional whitespace chars
( Capture group 1
.* Match any char except a newline 0+ times
(?: Non capture group
\r?\n(?!\[[^][]*]).* Match the line if it does not start with with the [...] pattern using a negative lookahead (?!
)* Close group and repeat 0+ times to get all the lines
) Close group
See a regex demo or a Python demo
Example code
import re
regex = r"^\[[^][]*]\r?\n\s*(.*(?:\r?\n(?!\[[^][]*]).*)*)"
data = ("[text1]\n\n"
"file contents I need\n\n"
"[some text2]\n\n"
"more file contents I need\n\n"
"[more text 3]\n\n"
"last bit of file contents I need\n"
"last bit of file contents I need")
matches = re.findall(regex, data, re.MULTILINE)
print(matches)
Output
['file contents I need\n', 'more file contents I need\n', 'last bit of file contents I need\nlast bit of file contents I need']
Given:
txt='''\
[text1]
file contents I need
[some text2]
more file contents I need
multi line at that
[more text 3]
last bit of file contents I need'''
(Which could be from a file...)
You can do:
>>> [e.strip() for e in re.findall(r'(?<=\])([\s\S]*?)(?=\[|\s*\Z)', txt)]
['file contents I need', 'more file contents I need\nmulti line at that', 'last bit of file contents I need']
Demo
You can also use re.finditer to locate each block of interest:
with open(ur_file) as f:
for i, block in enumerate(re.finditer(r'^\s*\[[^]]*\]([\s\S]*?)(?=^\s*\[[^]]*\]|\Z)', f.read(), flags=re.M)):
print(i, block.group(1))
The individual blocks leading and trailing whitespace can be dealt with as desired...

remove white spaces from the list

I am reading from a CSV file and appending the rows into a list. There are some white spaces that are causing issues in my script. I need to remove those white spaces from the list which I have managed to remove. However can someone please advise if this is the right way to do it?
ip_list = []
with open('name.csv') as open_file:
read_file = csv.DictReader(open_file)
for read_rows in read_file:
ip_list.append(read_rows['column1'])
ip_list = list(filter(None, ip_list))
print(ip_list)
Or a function would be preferable?
Here is a good way to read a csv file and store in list.
L=[] #Create an empty list for the main array
for line in open('log.csv'): #Open the file and read all the lines
x=line.rstrip() #Strip the \n from each line
L.append(x.split(',')) #Split each line into a list and add it to the
#Multidimensional array
print(L)
For example this csv file would produce an output like
This is the first line, Line1
This is the second line, Line2
This is the third line, Line3
This,
List = [('This is the first line', 'Line1'),
('This is the second line', 'Line2'),
('This is the third line', 'Line3')]
Because csv means comma seprated values you can filter based on commas

using title() creates a extra line in python

Purpose: Writing a code to capitalize the first letter of each word in a file.
Steps include opening the file in read mode and using title() in each line.
When the output is printed it creates extra blank line between each line in the file.
For example:
if the content is
one two three four
five six seven eight
output is:
One Two Three Four
Five Six Seven Eight
Not sure why the space shows up there
I used strip() followed by title() to escape the spaces but would like to know why we get spaces.
inputfile = input("Enter the file name:")
openedfile = open(inputfile, 'r')
for line in openedfile:
capitalized=line.title()
print(capitalized)
the above code prints output with an added blank line
solved it using below code:
inputfile = input("Enter the file name:")
openedfile = open(inputfile, 'r')
for line in openedfile:
capitalized=line.title().strip()
print(capitalized)
Expected to print capitalized words without spaces by just using title() and not title().strip()

Use Python to parse comma separated string with text delimiter coming from stdin

I have a csv file that is being fed to my Python script via stdin.
This is a comma separated file with quotations as text delimiter.
Here is an example line:
457,"Last,First",NYC
My script so far, splits each line by looking for commas, but how do I make it aware of the text delimiter quotes?
My current script:
for line in sys.stdin:
line = line.strip()
line.split(',')
print line
The code splits the name into two since it does not recognize the quotations enclosing that text field. I need the name to remain as a single element.
If it matters, the data is being fed through stdin within a hadoop-streaming program.
Thanks!
Well, you could do it more manually, with something like this:
row = []
enclosed = False
word = ''
for character in sys.stdin:
if character == '"':
enclosed = not enclosed
elif character = ',' and not enclosed:
row.append(word)
word = ''
else:
word += character
Haven't tested nor thought about it for too long but seems to me it could work. Probably someone more into Pythonist sintax could fine something better for doing the trick although ;)
Attempting to answer my own question. If I read right, it may be possible to send a streaming input into csv reader like so:
for line in csv.reader(sys.stdin):
print line

Resources