python : How to allow python to not add quotation marks while changing delimiter in the csv file - python-3.x

I have written a script to convert delimiter in the csv file from comma to pipe symbol but while doing so it doesn't remove the extra quotation marks added by csv file.
script is as follows:-
import csv
filename = "sample.csv"
with open(filename,mode='rU') as fin,open('c:\\files\\sample.txt',mode='w') as fout:
reader= csv.DictReader(fin)
writer = csv.DictWriter(fout,reader.fieldnames,delimiter='|')
writer.writeheader()
writer.writerows(reader)
case 1:
Now, for example if one of the field in the csv contains "hi" hows you,good then the csv will make it as """hi" hows you,good"" and python loads it as """hi" hows you,good"" in the text file instead of "hi" hows you,good
case 2:
Whereas for the fields like hi hows, you csv makes it as "hi hows,you" and after running the script it is saved as hi hows,you in the text file which is correct.
Please could you help me to solve case 1.
example csv file when you open it in notepad:-
ID,IDN,DESC,TNO
A019,1,"""Pins "" is dangerous",2
B020,1,"""ache"",headache/fever-like",3
C021,2,stomach cancer,1
D231,3,"hair,""fall""",1
after script result:
ID|IDN|DESC|TNO
A019|1|"""Pins "" is dangerous"|2
B020|1|"""ache"",headache/fever-like"|3
C021|2|stomach cancer|1
D231|3|"hair,""fall"""|1
i want the result as :
ID|IDN|DESC|TNO
A019|1|"Pins " is dangerous|2
B020|1|"ache",headache/fever-like|3
C021|2|stomach cancer|1
D231|3|hair,"fall"|1

that works:
writer = csv.DictWriter(fout,reader.fieldnames,delimiter='|',quoting=csv.QUOTE_NONE,quotechar="")
defining the quoting as "no quoting": quoting=csv.QUOTE_NONE
defining the quote char as "no quote char": quotechar=""
result
ID|IDN|DESC|TNO
A019|1|"Pins " is dangerous|2
B020|1|"ache",headache/fever-like|3
C021|2|stomach cancer|1
D231|3|hair,"fall"|1
note that quoting is useful. So disabling it exposes you to the joy of "delimiters in the fields". It's up to you to make sure it's not going to happen.

Related

extract words from a text file and print netxt line

sample input
in parsing a text file .txt = ["'blah.txt'", "'blah1.txt'", "'blah2.txt'" ]
the expected output in another text file out_path.txt
blah.txt
blah1.txt
blah2.txt
Code that I tried, this just appends "[]" to the input file. While I also tried perl one liner replacing double and single quotes.
read_out_fh = open('out_path.txt',"r")
for line in read_out_fh:
for word in line.split():
curr_line = re.findall(r'"(\[^"]*)"', '\n')
print(curr_line)
this happens because while you reading a file it will be taken as string and not as a list even if u kept the formatting of a list. thats why you getting [] while doing re.for line in read_in_fh: here you are taking each letters in the string thats why you are not getting the desired output. so iwrote something first to transform the string into a list. while doing that i also eliminated "" and '' as you mensioned. then wrote it in to a new file example.txt.
Note: change the file name according to your files
read_out_fh = open('file.txt',"r")
for line in read_out_fh:
line=line.strip("[]").replace('"','').replace("'",'').split(", ")
with open("example.txt", "w") as output:
for word in line:
#print(word)
output.write(word+'\n')
example.txt(outputfile)
blah.txt
blah1.txt
blah2.txt
The code below works out for your example you gave in the question:
# Content of textfile.txt:
asdasdasd=["'blah.txt'", "'blah1.txt'", "'blah2.txt'"]asdasdasd
# Code:
import re
read_in_fh = open('textfile.txt',"r")
write_out_fh = open('out_path.txt', "w")
for line in read_in_fh:
find_list = re.findall(r'\[(".*?"*)\]', line)
for element in find_list[0].split(","):
element_formatted = element.replace('"','').replace("'","").strip()
write_out_fh.write(element_formatted + "\n")
write_out_fh.close()

Is there a way to save a txt file under a new name after reading/writing the file?

I am trying to run a python program to open a template multiple times and while running through a loop, save multiple copies of the txt template under distinct file names.
An example problem is included below: The example template takes the following form:
Null Null
Null
This is the test
But there is still more text.
The code I've made to do a quick edit is as follows:
longStr = (r"C:\Users\jrwaller\Documents\Automated Eve\NewTest.txt")
import fileinput
for line in fileinput.FileInput(longStr,inplace=1):
if "This" in line:
line=line.replace(line,line+"added\n")
print(line, end='')
The output of the code correctly adds the new line "added" to the text file:
Null Null
Null
This is the test
added
But there is still more text.
However, I want to save this new text as a new file name, say "New Test Edited" while keeping a copy of the old txt file available for further edits.
Here is a working example for you:
longStr = (r"C:\Users\jrwaller\Documents\Automated Eve\NewTest.txt")
with open(longStr) as old_file:
with open(r"C:\Users\jrwaller\Documents\Automated Eve\NewTestEdited.txt", "w") as new_file:
for line in old_file:
if "This" in line:
line=line.replace(line,line+"added\n")
new_file.write(line)
A simple file read and write operation with a context managers to close up when you're finished.

How to convert a tab delimited text file to a csv file in Python

I have the following problem:
I want to convert a tab delimited text file to a csv file. The text file is the SentiWS dictionary which I want to use for a sentiment analysis ( https://github.com/MechLabEngineering/Tatort-Analyzer-ME/tree/master/SentiWS_v1.8c ).
The code I used to do this is the following:
txt_file = r"SentiWS_v1.8c_Positive.txt"
csv_file = r"NewProcessedDoc.csv"
in_txt = csv.reader(open(txt_file, "r"), delimiter = '\t')
out_csv = csv.writer(open(csv_file, 'w'))
out_csv.writerows(in_txt)
This code writes everything in one row but I need the data to be in three rows as normally intended from the file itself. There is also a blank line under each data and I don´t know why.
I want the data to be in this form:
Row1 Row2 Row3
Word Data Words
Word Data Words
instead of
Row1
Word,Data,Words
Word,Data,Words
Can anyone help me?
import pandas
It will convert tab delimiter text file into dataframe
dataframe = pandas.read_csv("SentiWS_v1.8c_Positive.txt",delimiter="\t")
Write dataframe into CSV
dataframe.to_csv("NewProcessedDoc.csv", encoding='utf-8', index=False)
Try this:
import csv
txt_file = r"SentiWS_v1.8c_Positive.txt"
csv_file = r"NewProcessedDoc.csv"
with open(txt_file, "r") as in_text:
in_reader = csv.reader(in_text, delimiter = '\t')
with open(csv_file, "w") as out_csv:
out_writer = csv.writer(out_csv, newline='')
for row in in_reader:
out_writer.writerow(row)
There is also a blank line under each data and I don´t know why.
You're probably using a file created or edited in a Windows-based text editor. According to the Python 3 csv module docs:
If newline='' is not specified, newlines embedded inside quoted fields will not be interpreted correctly, and on platforms that use \r\n linendings on write an extra \r will be added. It should always be safe to specify newline='', since the csv module does its own (universal) newline handling.

How to remove a line break from CSV output

I am writing this code to separate information which will be uploaded to a database using the resulting CSV file from the code I wrote. I have it so that if I receive a spreadsheet with First, Middle, and Last name all in the same column they can be split into three separated columns. However my output file has some extra line breaks or returns or something which I just went through in the CSV and deleted manually to get the data uploaded for now. How can I remove these within my code? I have some ideas but none seem to work. I tried using line.replace but I do not fully understand how that is supposed to work so it failed.
My code:
import csv
with open('c:\\users\\cmobley\\desktop\\split for crm check.csv', "r") as readfile:
name_split = []
for line in readfile:
whitespace_split = line.split(" ")
remove_returns = (line.replace('/n', "") for line in whitespace_split)
name_split.append(remove_returns)
print (name_split)
with open ('c:\\users\cmobley\\desktop\\testblank.csv', 'w', newline = '\n') as csvfile:
writer = csv.writer(csvfile, delimiter = ',',
quotechar = '"', quoting = csv.QUOTE_MINIMAL)
writer.writerows(name_split)
Thanks for any help that can be provided! I am still trying to learn Python.
You have a forward-slash rather than a backward-slash needed for escape sequences.
Change to:
remove_returns = (line.replace('\n', "") for line in whitespace_split)

Use Python to parse comma separated string with text delimiter coming from stdin

I have a csv file that is being fed to my Python script via stdin.
This is a comma separated file with quotations as text delimiter.
Here is an example line:
457,"Last,First",NYC
My script so far, splits each line by looking for commas, but how do I make it aware of the text delimiter quotes?
My current script:
for line in sys.stdin:
line = line.strip()
line.split(',')
print line
The code splits the name into two since it does not recognize the quotations enclosing that text field. I need the name to remain as a single element.
If it matters, the data is being fed through stdin within a hadoop-streaming program.
Thanks!
Well, you could do it more manually, with something like this:
row = []
enclosed = False
word = ''
for character in sys.stdin:
if character == '"':
enclosed = not enclosed
elif character = ',' and not enclosed:
row.append(word)
word = ''
else:
word += character
Haven't tested nor thought about it for too long but seems to me it could work. Probably someone more into Pythonist sintax could fine something better for doing the trick although ;)
Attempting to answer my own question. If I read right, it may be possible to send a streaming input into csv reader like so:
for line in csv.reader(sys.stdin):
print line

Resources