Field delimiter ',' found while expecting record delimiter '\n' - python-3.x

I am encountering an error while processing the csv file while loading it into snowflake. How do i treat this issue every time i encounter it as there are few records with the same error and I don't want to delete any record from the csv file.
Or how can i implement a try-except here? I want to either correct or skip records in csv with errors
import snowflake.connector
tableName='F58155'
ctx = snowflake.connector.connect(
user='-',
password='-',
account='-')
cs = ctx.cursor()
ctx.cursor().execute("USE DATABASE STORE_PROFILE_LANDING")
ctx.cursor().execute("USE SCHEMA PUBLIC")
ctx.cursor().execute("PUT file:///temp/data/{tableName}/* #%{tableName}".format(tableName=tableName))
ctx.cursor().execute("truncate table {tableName}".format(tableName=tableName))
ctx.cursor().execute("COPY INTO {tableName} ".format(tableName=tableName,
FIELD_OPTIONALLY_ENCLOSED_BY
= '"', sometimes=','))
ctx.close()
Here is the image of line 178 where I am getting the error.

This is explained here. Most common cause of this error is that field value contains the delimiter
https://community.snowflake.com/s/article/Copy-Error-Message-Field-delimiter-found-while-expecting-record-delimiter-n

Related

I have one person in a dataframe that keeps showing up as \ufeff in my dataframe when I print to console

I have python code that loads a group of exam results. Each exam is saved in it's own csv file.
files = glob.glob('Exam *.csv')
frame = []
files1 = glob.glob('Exam 1*.csv')
for file in files:
frame.append(pd.read_csv(file, index_col=[0], encoding='utf-8-sig'))
for file in files1:
frame.append(pd.read_csv(file, index_col=[0], encoding='utf-8-sig'))
There is one person in the whole dataframe in their name column it shows up as
\ufeffStudents Name
It happens for every single exam. I tried using the encoding argument but that's not fixing the issue. I am out of ideas. Anyone else have anything?
That character is the BOM or "Byte Order Mark."
There are serveral ways to resovle it.
First, I want to suggest to add engine parameter (for example, engine='python' in pd.read_csv() when reading csv files.
pd.read_csv(file, index_col=[0], engine='python', encoding='utf-8-sig')
Secondly, you can simply remove it by replacing with empty string ('').
df['student_name'] = df['student_name'].apply(lambda x: x.replace("\ufeff", ""))

Remove double quotes while printing string in dataframe to text file

I have a dataframe which contains one column with multiple strings. Here is what the data looks like:
Value
EU-1050-22345,201908 XYZ DETAILS, CD_123_123;CD_123_124,2;1
There are almost 100,000 such rows in the dataframe. I want to write this data into a text file.
For this, I tried the following:
df.to_csv(filename, header=None,index=None,mode='a')
But I am getting the entire string in quotes when I do this. The output I obtain is:
"EU-1050-22345,201908 XYZ DETAILS, CD_123_123;CD_123_124,2;1"
But what I want is:
EU-1050-22345,201908 XYZ DETAILS, CD_123_123;CD_123_124,2;1 -> No Quotes
I also tried this:
df.to_csv(filename,header=None,index=None,mode='a',
quoting=csv.QUOTE_NONE)
However, I get an error that an escapechar is required. If i add escapechar='/' into the code, I get '/' in multiple places (but no quotes). I don't want the '/' either.
Is there anyway I can remove the quotes while writing into a text file WITHOUT adding any other escape characters ?
Based on OP's comment, I believe the semicolon is messing things up. I no longer have unwanted \ if using tabs to delimit csv.
import pandas as pd
import csv
df = pd.DataFrame(columns=['col'])
df.loc[0] = "EU-1050-22345,201908 XYZ DETAILS, CD_123_123;CD_123_124,2;1"
df.to_csv("out.csv", sep="\t", quoting=csv.QUOTE_NONE, quotechar="", escapechar="")
Original Answer:
According to this answer, you need to specify escapechar="\\" to use csv.QUOTE_NONE.
Have you tried:
df.to_csv("out.csv", sep=",", quoting=csv.QUOTE_NONE, quotechar="", escapechar="\\")
I was able to write a df to a csv using a single space as the separator and get the "quotes" around strings removed by replacing existing in-string spaces in the dataframe with non-breaking spaces before I wrote it as as csv.
df = df.applymap(lambda x: str(x).replace(' ', u"\u00A0"))
df.to_csv(outpath+filename, header=True, index=None, sep=' ', mode='a')
I couldn't use a tab delimited file for what I was writing output for, though that solution also works using additional keywords to df.to_csv(): quoting=csv.QUOTE_NONE, quotechar="", escapechar="")

How to convert Navigable String to File Object

I am trying to get some data from a website (using the modules named requests & BeautifulSoup) and print it in a text file but every time I try to do so, it says the following:
TypeError: descriptor 'write' requires a 'file' object but received a 'NavigableString'
I have tried using the csv library to import the data but since I couldn't add the line by line data to the csv, I decided to add all the output to a text file and then take out the data I require.
file_object = open("name-list.txt", "w") #Opening the file
name = soup.find(class_='table-responsive') #Extracting the data
name_list = name.find_all('td') #Refining the data
for final in name_list:
all = final.contents[0] #Final result
file.write(all) #This is where the Error Comes
file.close()
When I use print(all) in the for loop, I get the output that I need which consists of multi-line text including the names, age, gender, etc. of the people from the table on the website but when I try to print that output into the text file, the error pops up.

Save Tweets as .csv, Contains String Literals and Entities

I have tweets saved in JSON text files. I have a friend who wants tweets containing keywords, and the tweets need to be saved in a .csv. Finding the tweets is easy, but I run into two problems and am struggling with finding a good solution.
Sample data are here. I have included the .csv file that is not working as well as a file where each row is a tweet in JSON format.
To get into a dataframe, I use pd.io.json.json_normalize. It works smoothly and handles nested dictionaries well, but pd.to_csv does not work because it does not handle, as far as I can tell, string literals well. Some of the tweets contain '\n' in the text field, and pandas writes new lines when that happens.
No problem, I process pd['text'] to remove '\n'. The resulting file still has too many rows, 1863 compared to the 1388 it should. I then modified my code to replace all string-literals:
tweets['text'] = [item.replace('\n', '') for item in tweets['text']]
tweets['text'] = [item.replace('\r', '') for item in tweets['text']]
tweets['text'] = [item.replace('\\', '') for item in tweets['text']]
tweets['text'] = [item.replace('\'', '') for item in tweets['text']]
tweets['text'] = [item.replace('\"', '') for item in tweets['text']]
tweets['text'] = [item.replace('\a', '') for item in tweets['text']]
tweets['text'] = [item.replace('\b', '') for item in tweets['text']]
tweets['text'] = [item.replace('\f', '') for item in tweets['text']]
tweets['text'] = [item.replace('\t', '') for item in tweets['text']]
tweets['text'] = [item.replace('\v', '') for item in tweets['text']]
Same result, pd.to_csv saves a file with more rows than actual tweets. I could replace string literals in all columns, but that is clunky.
Fine, don't use pandas. with open(outpath, 'w') as f: and so on creates a .csv file with the correct number of rows. Reading the file, either with pd.read_csv or reading line by line will fail, however.
It fails because of how Twitter handles entities. If a tweet's text contains a url, mention, hashtag, media, or link, then Twitter returns a dictionary that contains commas. When pandas flattens the tweet, the commas get preserved within a column, which is good. But when the data are read in, pandas splits what should be one column into multiple columns. For example, a column might look like [{'screen_name': 'ProfOsinbajo','name': 'Prof Yemi Osinbajo','id': 2914442873,'id_str': '2914442873', 'indices': [0,' 13]}]', so splitting on commas creates too many columns:
[{'screen_name': 'ProfOsinbajo',
'name': 'Prof Yemi Osinbajo',
'id': 2914442873",
'id_str': '2914442873'",
'indices': [0,
13]}]
That is the outcome whether I use with open(outpath) as f: as well. With that approach, I have to split lines, so I split on commas. Same problem - I do not want to split on commas if they occur in a list.
I want those data to be treated as one column when saved to file or read from file. What am I missing? In terms of the data at the repository above, I want to convert forstackoverflow2.txt to a .csv with as many rows as tweets. Call this file A.csv, and let's say it has 100 columns. When opened, A.csv should also have 100 columns.
I'm sure there are details I've left out, so please let me know.
Using the csv module works. It writes the file out as a .csv while counting the lines, then reads it back in and counts the lines again.
The result matched, and opening the .csv in Excel also gives 191 columns and 1338 lines of data.
import json
import csv
with open('forstackoverflow2.txt') as f,\
open('out.csv','w',encoding='utf-8-sig',newline='') as out:
data = json.loads(next(f))
print('columns',len(data))
writer = csv.DictWriter(out,fieldnames=sorted(data))
writer.writeheader() # write header
writer.writerow(data) # write the first line of data
for i,line in enumerate(f,2): # start line count at two
data = json.loads(line)
writer.writerow(data)
print('lines',i)
with open('out.csv',encoding='utf-8-sig',newline='') as f:
r = csv.DictReader(f)
lines = list(r)
print('readback columns',len(lines[0]))
print('readback lines',len(lines))
Output:
columns 191
lines 1338
readback lines 1338
readback columns 191
#Mark Tolonen's answer is helpful, but I ended up going a separate route. When saving the tweets to file, I removed all \r, \n, \t, and \0 characters from anywhere in the JSON. Then, I save the file with as tab separated so that commas in fields like location or text do not confuse a read function.

How to remove a line break from CSV output

I am writing this code to separate information which will be uploaded to a database using the resulting CSV file from the code I wrote. I have it so that if I receive a spreadsheet with First, Middle, and Last name all in the same column they can be split into three separated columns. However my output file has some extra line breaks or returns or something which I just went through in the CSV and deleted manually to get the data uploaded for now. How can I remove these within my code? I have some ideas but none seem to work. I tried using line.replace but I do not fully understand how that is supposed to work so it failed.
My code:
import csv
with open('c:\\users\\cmobley\\desktop\\split for crm check.csv', "r") as readfile:
name_split = []
for line in readfile:
whitespace_split = line.split(" ")
remove_returns = (line.replace('/n', "") for line in whitespace_split)
name_split.append(remove_returns)
print (name_split)
with open ('c:\\users\cmobley\\desktop\\testblank.csv', 'w', newline = '\n') as csvfile:
writer = csv.writer(csvfile, delimiter = ',',
quotechar = '"', quoting = csv.QUOTE_MINIMAL)
writer.writerows(name_split)
Thanks for any help that can be provided! I am still trying to learn Python.
You have a forward-slash rather than a backward-slash needed for escape sequences.
Change to:
remove_returns = (line.replace('\n', "") for line in whitespace_split)

Resources