How can I delete these characters in a csv file using python? - python-3.x

I have a code that generates a table and prints it as a csv file but when I run it, it also displays certain characters like quotation marks and parenthesis.
I currently don't have pandas, so if there's a solution that does not include it, I would greatly appreciate it. I know it should be something "simple" as all it is, is a formatting issue. Below is the piece of code that prints the table and also my current and desire results
Code:
def PrintAsCsv(table):
for r in table:
print((r[0], r[1], r[3], r[5], r[6], r[7], r[8]))
Current results in the header of the table:
('Ssid' 'Vlan' 'Connected Time' 'Rssi' 'Date' 'Wap Name' 'Device Name')
Desired results in the header of the table:
Ssid Vlan Connected Time Rssi Date Wap Name Device Name

As your fields contain spaces, you'll want to have a different separator, e.g. a comma (default for csv):
def PrintAsCsv(table):
for r in table:
print(','.join((r[0], r[1], r[3], r[5], r[6], r[7], r[8])))
Output:
Ssid,Vlan,Connected Time,Rssi,Date,Wap Name,Device Name

Related

Getting KeyError for pandas df column name that exists

I have
data_combined = pd.read_csv("/path/to/creole_data/data_combined.csv", sep=";", encoding='cp1252')
So, when I try to access these rows:
data_combined = data_combined[(data_combined["wals_code"]=="abk") &(data_combined["wals_code"]=="aco")]
I get a KeyError 'wals_code'. I then checked my list of col names with
print(data_combined.columns.tolist())
and saw the col name 'wals_code' in the list. Here's the first few items from the print out.
[',"wals_code","Order of subject, object and verb","Order of genitive and noun","Order of adjective and noun","Order of adposition and NP","Order of demonstrative and noun","Order of numeral and noun","Order of RC and noun","Order of degree word and adjective"]
Anyone have a clue what is wrong with my file?
The problem is the delimiter you're using when reading the CSV file. With sep=';', you instruct read_csv to use semicolons (;) as the separators for columns (cells and column headers), but it appears from your columns print out that your CSV file actually uses commas (,).
If you look carefully, you'll notice that your columns print out displays actually a list with one long string, not a list of individual strings representing the columns names.
So, use sep=',' instead of sep=';' (or just omit it entirely as , is the default value for sep):
data_combined = pd.read_csv("/path/to/creole_data/data_combined.csv", encoding='cp1252')

Deleting a particular column/row from a CSV file using python

I want to delete a particular row from a given user input that matches with a column.
Let's say I get an employee ID and delete all it's corresponding values in the row.
Not sure how to approach this problem and other sources suggest using a temporary csv file to copy all values and re-iterate.
Since these are very primitive requirements, I would just do it manually.
Read it line by line - if you want to delete the current line, just don't write it back.
If you want to delete a column, for each line, parse it as csv (using the module csv - do not use .split(',')!) and discard the correct column.
The upside of these solutions is that it's very light on the memory and as fast as it can be runtime-wise.
That's pretty much the way to do it.
Something like:
import shutil
file_path = "test.csv"
# Creates a test file
data = ["Employee ID,Data1,Data2",
"111,Something,Something",
"222,Something,Something",
"333,Something,Something"]
with open(file_path, 'w') as write_file:
for item in data:
write_file.write(item + "\n")
# /Creates a test file
input("Look at the test.csv file if you like, close it, then press enter.")
employee_ID = "222"
with open(file_path) as read_file:
with open("temp_file.csv", 'w') as temp_file:
for line in read_file:
if employee_ID in line:
next(read_file)
temp_file.write(line)
shutil.move("temp_file.csv", file_path)
If you have other data that may match the employee ID, then you'll have to parse the line and check the employee ID column specifically.

Remove double quotes while printing string in dataframe to text file

I have a dataframe which contains one column with multiple strings. Here is what the data looks like:
Value
EU-1050-22345,201908 XYZ DETAILS, CD_123_123;CD_123_124,2;1
There are almost 100,000 such rows in the dataframe. I want to write this data into a text file.
For this, I tried the following:
df.to_csv(filename, header=None,index=None,mode='a')
But I am getting the entire string in quotes when I do this. The output I obtain is:
"EU-1050-22345,201908 XYZ DETAILS, CD_123_123;CD_123_124,2;1"
But what I want is:
EU-1050-22345,201908 XYZ DETAILS, CD_123_123;CD_123_124,2;1 -> No Quotes
I also tried this:
df.to_csv(filename,header=None,index=None,mode='a',
quoting=csv.QUOTE_NONE)
However, I get an error that an escapechar is required. If i add escapechar='/' into the code, I get '/' in multiple places (but no quotes). I don't want the '/' either.
Is there anyway I can remove the quotes while writing into a text file WITHOUT adding any other escape characters ?
Based on OP's comment, I believe the semicolon is messing things up. I no longer have unwanted \ if using tabs to delimit csv.
import pandas as pd
import csv
df = pd.DataFrame(columns=['col'])
df.loc[0] = "EU-1050-22345,201908 XYZ DETAILS, CD_123_123;CD_123_124,2;1"
df.to_csv("out.csv", sep="\t", quoting=csv.QUOTE_NONE, quotechar="", escapechar="")
Original Answer:
According to this answer, you need to specify escapechar="\\" to use csv.QUOTE_NONE.
Have you tried:
df.to_csv("out.csv", sep=",", quoting=csv.QUOTE_NONE, quotechar="", escapechar="\\")
I was able to write a df to a csv using a single space as the separator and get the "quotes" around strings removed by replacing existing in-string spaces in the dataframe with non-breaking spaces before I wrote it as as csv.
df = df.applymap(lambda x: str(x).replace(' ', u"\u00A0"))
df.to_csv(outpath+filename, header=True, index=None, sep=' ', mode='a')
I couldn't use a tab delimited file for what I was writing output for, though that solution also works using additional keywords to df.to_csv(): quoting=csv.QUOTE_NONE, quotechar="", escapechar="")

Writing in columns of an excel file from a list of texts

I have a list of texts (reviews_train) which I gathered from a text file (train.txt).
reviews_train = []
for line in open('C:\\Users\\Dell\\Desktop\\New Beginnings\\movie_data\\train.txt', 'r', encoding="utf8"):
reviews_train.append(line.strip())
Suppose reviews_train = ["Nice movie","Bad film",....]
I have another result.csv file which looks like
company year
a 2000
b 2001
.
.
.
What I want to do is add another column text to the existing file to look something like this.
company year text
a 2000 Nice movie
b 2001 Bad film
.
.
.
The items of the list should get appended in the new column one after the other.
I am really new to python. Can some one please tell me how to do it? Any help is really aprreciated.
EDIT: My question is not just about adding another column in the .csv file. The column should have the texts in the list appended row wise.
EDIT: I used the solution given by #J_H but I get this error
Use zip():
def get_rows(infile='result.csv'):
with open(infile) as fin:
sheet = csv.reader(fin)
for row in sheet:
yield list(row)
def get_lines(infile=r'C:\Users\Dell\Desktop\New Beginnings\movie_data\train.txt'):
return open(infile).readlines()
for row, line in zip(get_rows(), get_lines()):
row.append(line)
print(row)
With those 3-element rows in hand,
you could e.g. writerow().
EDIT
The open() in your question mentions 'r' and encoding='utf8',
which I suppressed since open() should default to using those.
Apparently you're not using the python3 mentioned in your tag,
or perhaps an ancient version.
PEPs 529 & 540 suggest that since 3.6 windows will default to UTF-8,
just like most platforms.
If your host manages to default to something crazy like CP1252,
then you will certainly want to override that:
return open(infile, encoding='utf8').readlines()

How do I prevent my program from writing incorrect input to a CSV file?

I want to input data into a CSV file and have so far accomplished that. The only problem I have now encountered is that when the user inputs the incorrect information to a string only input (eg. name = b12f) the program successfully prompts the user again to enter only strings (eg. name = biff).
However, when I look at the CSV file the incorrect value was recorded instead of the corrected input. How do I correct this?
import csv
surname = input('\nSurname:')
surname_checker(surname) #I wrote a seperate function that correctly checks the input to judge if it is correct and reprompt if not
s = surname.capitalize()
with open('Try.csv','a',newline='') as csvfile:
appendCSV = csv.writer(csvfile)
appendCSV.writerow([s])
csvfile.close()
Outcome if b12f is entered then Biff correctly entered:
Surname:b12f
Please define Surname only in letters
Surname:biff
>>> print(surname)
b12f
This is not CSV specific. Your use of print(surname) should be a sign - it's not doing anything with the CSV file, but it's still printing the wrong thing.
Your custom function needs to return the corrected surname, and you need to replace the stored surname with it.
Something like surname = surname_checker(surname)

Resources