How to read desired rows from large CSV files in python

How to read desired rows from large CSV files in python - python-3.x

I am trying to search for data from a CSV file and pass on the data to another python code.
The CSV file has 100000+ rows and from them, I want to pass on the requested data per my choice.
Actual Code:
input_file = 'trusted.csv'
users = []
with open(input_file, encoding='UTF-8') as f:
rows = csv.reader(f,delimiter=",",lineterminator="\n")
next(rows, None)
for row in rows:
user = {}
user['username'] = row[0]
user['id'] = int(row[1])
user['access_hash'] = int(row[2])
user['name'] = row[3]
users.append(user)
Parsing data to code:
g_index = input("Enter a Number: ")
target_group=groups[int(g_index)]
target_group.access_hash
The Actual code will parse All the rows from the CSV file and I am trying to find a solution for a python code that can pass on data - like from 11 to 20 rows, 50 to 100 rows likewise.
I tried the below code but received an error when the data was parsed to another python code:
import CSV
input_file = 'lucky280.csv'
start = 10
stop = start + 10
users = []
with open(input_file, encoding='UTF-8') as f:
rows = csv.reader(f,delimiter=",",lineterminator="\n")
for i, line in enumerate(rows):
if i >= start:
users.append(line)
if i > stop:
break
for row in rows:
user = {}
user['username'] = row[0]
user['id'] = int(row[1])
user['access_hash'] = int(row[2])
user['name'] = row[3]
users.append(user)
ERROR :
Traceback (most recent call last):
File "", line 10, in
print ("Adding {}".format(user['id']))
TypeError: list indices must be integers or slices, not str
If I use the Actual code the file reading will work fine but it will parse all the data in the file.
Please HELP!
After recommendation I also tried
input_file = 'lucky280.csv'
users = []
from itertools import islice
with open(input_file, encoding='UTF-8') as f:
rows = csv.reader(f,delimiter=",",lineterminator="\n")
rowiter = islice(rows, 3, 5)
for item in rowiter:
for row in rows:
user = {}
user['username'] = row[0]
user['id'] = int(row[1])
user['access_hash'] = int(row[2])
user['name'] = row[3]
users.append(user)
got the below error
IndexError Traceback (most recent call last)
<ipython-input-108-9f4099c2e53d> in <module>()
10 user = {}
11 user['username'] = row[0]
---> 12 user['id'] = int(row[1])
13 user['access_hash'] = int(row[2])
14 user['name'] = row[3]
IndexError: list index out of range

You can use islice from itertools
So here i have as sample csv file
X Y
0 21 test3
1 8 test1
2 75 test1
3 26 test2
4 98 test3
5 63 test3
6 65 test3
7 39 test3
8 74 test1
9 26 test2
And suppose I want only rows 3 and 4
>>> from itertools import islice
>>> with open('test.csv') as f:
... rows = csv.reader(f)
... rowiter = islice(rows, 3, 5)
... for item in rowiter:
... print(item)
gives me the following output
['2', '75', 'test1']
['3', '26', 'test2']
Update
input_file = 'trusted.csv'
start = 10
stop = start + 10
users = []
with open(input_file, encoding='UTF-8') as f:
rows = csv.reader(f,delimiter=",",lineterminator="\n")
rowiter = islice(rows, start, stop)
for row in rowiter :
user = {}
user['username'] = row[0]
user['id'] = int(row[1])
user['access_hash'] = int(row[2])
user['name'] = row[3]
users.append(user)

Related

Iterate through excel files' sheets and append if sheet names share common part in Python

Let's say we have many excel files with the multiple sheets as follows:
Sheet 1: 2021_q1_bj
a b c d
0 1 2 23 2
1 2 3 45 5
Sheet 2: 2021_q2_bj
a b c d
0 1 2 23 6
1 2 3 45 7
Sheet 3: 2019_q1_sh
a b c
0 1 2 23
1 2 3 45
Sheet 4: 2019_q2_sh
a b c
0 1 2 23
1 2 3 40
I wish to append all the sheets to one if the last part split by _ of sheet names are same across all excel files. ie., sheet 1 will append with sheet 2 since their both have common bj, if another excel file also have sheets with name bj, it will also be append to this one, same logic for sheet 3 and sheet 4.
How could I achieve that in Pandas or other Python packages?
The expected result for current excel file would be:
bj:
a b c d
0 1 2 23 2
1 2 3 45 5
2 1 2 23 6
3 2 3 45 7
sh:
a b c
0 1 2 23
1 2 3 45
2 1 2 23
3 2 3 40
Code for reference:
import os, glob
import pandas as pd
files = glob.glob("*.xlsx")
for each in files:
dfs = pd.read_excel(each, sheet_name=None, index_col=[0])
df_out = pd.concat(dfs.values(), keys=dfs.keys())
for n, g in df_out.groupby(df_out.index.to_series().str[0].str.rsplit('_', n=1).str[-1]):
g.droplevel(level=0).dropna(how='all', axis=1).reset_index(drop=True).to_excel(f'Out_{n}.xlsx', index=False)
Update:
You may download test excel files and final expected result from this link.

Try:
dfs = pd.read_excel('Downloads/WS_1.xlsx', sheet_name=None, index_col=[0])
df_out = pd.concat(dfs.values(), keys=dfs.keys())
for n, g in df_out.groupby(df_out.index.to_series().str[0].str.rsplit('_', n=1).str[-1]):
g.droplevel(level=0).dropna(how='all', axis=1).reset_index(drop=True).to_excel(f'Out_{n}.xlsx')
Update
import os, glob
import pandas as pd
files = glob.glob("Downloads/test_data/*.xlsx")
writer = pd.ExcelWriter('Downloads/test_data/Output_file.xlsx', engine='xlsxwriter')
excel_dict = {}
for each in files:
dfs = pd.read_excel(each, sheet_name=None, index_col=[0])
excel_dict.update(dfs)
df_out = pd.concat(dfs.values(), keys=dfs.keys())
for n, g in df_out.groupby(df_out.index.to_series().str[0].str.rsplit('_', n=1).str[-1]):
g.droplevel(level=0).dropna(how='all', axis=1).reset_index(drop=True).to_excel(writer, index=False, sheet_name=f'{n}')
writer.save()
writer.close()

I have achieved the whole process and get the final expected result with the code below.
Thanks to provide alternative and more concise solutions or give me some advices if it's possible:
import os, glob
import pandas as pd
from pandas import ExcelWriter
from datetime import datetime
def save_xls(dict_df, path):
writer = ExcelWriter(path)
for key in dict_df:
dict_df[key].to_excel(writer, key, index=False)
writer.save()
root_dir = './original/'
for root, subFolders, files in os.walk(root_dir):
# print(subFolders)
for file in files:
if '.xlsx' in file:
file_path = os.path.join(root_dir, file)
print(file)
f = pd.ExcelFile(file_path)
dict_dfs = {}
for sheet_name in f.sheet_names:
df_new = f.parse(sheet_name = sheet_name)
print(sheet_name)
## get the year and quarter from the sheet name
year, quarter, city = sheet_name.split("_")
# year, quarter, city = sheet_name.split("_")
df_new["year"] = year
df_new["quarter"] = quarter
df_new["city"] = city
dict_dfs[sheet_name] = df_new
save_xls(dict_df = dict_dfs, path = './add_columns_from_sheet_name/' + "new_" + file)
root_dir = './add_columns_from_sheet_name/'
list1 = []
df = pd.DataFrame()
for root, subFolders, files in os.walk(root_dir):
# print(subFolders)
for file in files:
if '.xlsx' in file:
# print(file)
city = file.split('_')[0]
# print(file)
file_path = os.path.join(root_dir, file)
# print(file_path)
dfs = pd.read_excel(file_path, sheet_name=None)
df_out = pd.concat(dfs.values(), keys=dfs.keys())
for n, g in df_out.groupby(df_out.index.to_series().str[0].str.rsplit('_', n=1).str[-1]):
print(n)
timestr = datetime.utcnow().strftime('%Y%m%d-%H%M%S%f')[:-3]
g.droplevel(level=0).dropna(how='all', axis=1).reset_index(drop=True).to_excel(f'./output/{n}_{timestr}.xlsx', index=False)
file_set = set()
file_dir = './output/'
file_list = os.listdir(file_dir)
for file in file_list:
data_type = file.split('_')[0]
file_set.add(data_type)
print(file_set)
file_dir = './output'
file_list = os.listdir(file_dir)
df1 = pd.DataFrame()
df2 = pd.DataFrame()
df3 = pd.DataFrame()
df4 = pd.DataFrame()
file_set = set()
for file in file_list:
if '.xlsx' in file:
# print(file)
df_temp = pd.read_excel(os.path.join(file_dir, file))
if 'bj' in file:
df1 = df1.append(df_temp)
elif 'sh' in file:
df2 = df2.append(df_temp)
elif 'gz' in file:
df3 = df3.append(df_temp)
elif 'sz' in file:
df4 = df4.append(df_temp)
# function
def dfs_tabs(df_list, sheet_list, file_name):
writer = pd.ExcelWriter(file_name,engine='xlsxwriter')
for dataframe, sheet in zip(df_list, sheet_list):
dataframe.to_excel(writer, sheet_name=sheet, startrow=0 , startcol=0, index=False)
writer.save()
# list of dataframes and sheet names
dfs = [df1, df2, df3, df4]
sheets = ['bj', 'sh', 'gz', 'sz']
# run function
dfs_tabs(dfs, sheets, './final/final_result.xlsx')

How to make my pandas script export to csv

I have managed to get the output I want from this script:
But I am having trouble exporting it to a csv using:
v.to_csv(n + '.csv', index=False)
I get this error:
Traceback (most recent call last): Python/CouponRedemptions/start.py", line 22, in <module> print(v['invoice_line_normal_price']) File "~/.local/lib/python3.8/site-packages/pandas/core/frame.py", line 2902, in getitem indexer = self.columns.get_loc(key) File "~/.local/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2893, in get_loc raise KeyError(key) from err KeyError: 'invoice_line_normal_price'
I think it is the way the DF is structured, you cannot export it in its current state. I was wondering how I would go about making this work or any suggestions on where I cant start looking.
import pandas as pd
import re
r = pd.read_csv('cp.csv', low_memory=False)
r = r.filter(['shop_name','order_coupon_code','invoice_line_type','invoice_date','invoice_line_normal_price'])
r = r[r.order_coupon_code.notnull()]
r['invoice_line_normal_price'] = pd.to_numeric(r['invoice_line_normal_price'],errors = 'coerce')
n = input("Enter the coupon name: ")
nr = r[r.order_coupon_code.str.match(n,flags=re.IGNORECASE)]
nr = nr[nr.invoice_line_type.str.match('charge')]
nr = nr.sort_values('shop_name')
v = nr.groupby(['shop_name'])['invoice_line_normal_price'].value_counts().to_frame('counts')
print(v)

example of csv code
shop_name order_coupon_code invoice_line_type invoice_date invoice_line_normal_price moresome moreother hello
0 shop1 nv55 sell 01.01.2016 01:00:00.000 15.0 3 tt hi
1 shop2 nv44 quote 01.01.2016 02:00:00.000 22.0 4 rr hey
2 shop3 nv22 charge 01.01.2016 03:00:00.000 27.0 5 dd what
mport pandas as pd
# The low_memory option is not properly deprecated, but it should be, since it does not actually do anything differently
r = pd.read_csv('cp.csv')
print(r)
# r = r.loc[:,['shop_name', 'order_coupon_code', 'invoice_line_type', 'invoice_date', 'invoice_line_normal_price']]
r = r.filter(['shop_name','order_coupon_code','invoice_line_type','invoice_date','invoice_line_normal_price'])
r = r[r.order_coupon_code.notnull()]
r['invoice_line_normal_price'] = pd.to_numeric(r['invoice_line_normal_price'],errors = 'coerce')
# Enter the coupon name: nv22
n = input("Enter the coupon name: ")
nr = r[r.order_coupon_code.str.contains(n.lower())]
nr = nr[nr.invoice_line_type.str.match('charge')]
nr = nr.sort_values('shop_name')
v = nr.groupby(['shop_name'])['invoice_line_normal_price'].value_counts().to_frame('counts')
print(v)
v.to_csv(n + '.csv', index=False)
the output
shop_name invoice_line_normal_price
shop3 27.0 1
let say you need to add more to Single csv file
v.to_csv(n + '.csv',mode='a', index=False)
no header
v.to_csv(n + '.csv',mode='a', index=False,header=False)
just to make sure this error mean the name of column is not in your csv file check out the column name on your csv file
get_loc raise KeyError(key) from err KeyError: 'invoice_line_normal_price'

IndexError multiprocessing.Pool

I'm getting an IndexError using multiprocessing to process parts of a pandas DataFrame in parallel. vacancies is a pandas DataFrame containing several vacancies, of which one column is the raw text.
def addSkillRelevance(vacancies):
skills = pickle.load(open("skills.pkl", "rb"))
vacancies['skill'] = ''
vacancies['skillcount'] = 0
vacancies['all_skills_in_vacancy'] = ''
new_vacancies = pd.DataFrame(columns=vacancies.columns)
for vacancy_index, vacancy_row in vacancies.iterrows():
#Create a df for which each row is a found skill (with the other attributes of the vacancy)
per_vacancy_df = pd.DataFrame(columns=vacancies.columns)
all_skills_in_vacancy = []
skillcount = 0
for skill_index, skill_row in skills.iterrows():
#Making the search for the skill in the text body a bit smarter
spaceafter = ' ' + skill_row['txn_skill_name'] + ' '
newlineafter = ' ' + skill_row['txn_skill_name'] + '\n'
tabafter = ' ' + skill_row['txn_skill_name'] + '\t'
#Statement that returns true if we find a variation of the skill in the text body
if((spaceafter in vacancies.at[vacancy_index,'body']) or (newlineafter in vacancies.at[vacancy_index,'body']) or (tabafter in vacancies.at[vacancy_index,'body'])):
#Adding the skill to the list of skills found in the vacancy
all_skills_in_vacancy.append(skill_row['txn_skill_name'])
#Increasing the skillcount
skillcount += 1
#Adding the skill to the row
vacancies.at[vacancy_index,'skill'] = skill_row['txn_skill_name']
#Add a row to the vacancy df where 1 row, means 1 skill
per_vacancy_df = per_vacancy_df.append(vacancies.iloc[vacancy_index])
#Adding the list of all found skills in the vacancy to each (skill) row
per_vacancy_df['all_skills_in_vacancy'] = str(all_skills_in_vacancy)
per_vacancy_df['skillcount'] = skillcount
#Adds the individual vacancy df to a new vacancy df
new_vacancies = new_vacancies.append(per_vacancy_df)
return(new_vacancies)
def executeSkillScript(vacancies):
from multiprocessing import Pool
vacancies = vacancies.head(100298)
num_workers = 47
pool = Pool(num_workers)
vacancy_splits = np.array_split(vacancies, num_workers)
results_list = pool.map(addSkillRelevance,vacancy_splits)
new_vacancies = pd.concat(results_list, axis=0)
pool.close()
pool.join()
executeSkillScript(vacancies)
The function addSkillRelevance() takes in a pandas DataFrame and outputs a pandas DataFrame (with more columns). For some reason, after finishing all the multiprocessing, I get an IndexError on results_list = pool.map(addSkillRelevance,vacancy_splits). I'm quite stuck as I don't know how to handle the error. Does anyone have tips as to why the IndexError is occurring?
The error:
IndexError Traceback (most recent call last)
<ipython-input-11-7cb04a51c051> in <module>()
----> 1 executeSkillScript(vacancies)
<ipython-input-9-5195d46f223f> in executeSkillScript(vacancies)
14
15 vacancy_splits = np.array_split(vacancies, num_workers)
---> 16 results_list = pool.map(addSkillRelevance,vacancy_splits)
17 new_vacancies = pd.concat(results_list, axis=0)
18
~/anaconda3/envs/amazonei_tensorflow_p36/lib/python3.6/multiprocessing/pool.py in map(self, func, iterable, chunksize)
264 in a list that is returned.
265 '''
--> 266 return self._map_async(func, iterable, mapstar, chunksize).get()
267
268 def starmap(self, func, iterable, chunksize=None):
~/anaconda3/envs/amazonei_tensorflow_p36/lib/python3.6/multiprocessing/pool.py in get(self, timeout)
642 return self._value
643 else:
--> 644 raise self._value
645
646 def _set(self, i, obj):
IndexError: single positional indexer is out-of-bounds
As per the suggestion

The error is coming from this line:
per_vacancy_df = per_vacancy_df.append(vacancies.iloc[vacancy_index])
The error is occuring because vacancy_index is not in the index of the vacancies dataframe.

Cannot plot dataframe as barh because TypeError: Empty 'DataFrame': no numeric data to plot

I have been all over this site and google trying to solve this problem.
It appears as though I'm missing a fundamental concept in making a plottable dataframe.
I've tried to ensure that I have a column of strings for the "Teams" and a column of ints for the "Points"
Still I get: TypeError: Empty 'DataFrame': no numeric data to plot
import csv
import pandas
import numpy
import matplotlib.pyplot as plt
from matplotlib.ticker import StrMethodFormatter
set_of_teams = set()
def load_epl_games(file_name):
with open(file_name, newline='') as csvfile:
reader = csv.DictReader(csvfile)
raw_data = {"HomeTeam": [], "AwayTeam": [], "FTHG": [], "FTAG": [], "FTR": []}
for row in reader:
set_of_teams.add(row["HomeTeam"])
set_of_teams.add(row["AwayTeam"])
raw_data["HomeTeam"].append(row["HomeTeam"])
raw_data["AwayTeam"].append(row["AwayTeam"])
raw_data["FTHG"].append(row["FTHG"])
raw_data["FTAG"].append(row["FTAG"])
raw_data["FTR"].append(row["FTR"])
data_frame = pandas.DataFrame(data=raw_data)
return data_frame
def calc_points(team, table):
points = 0
for row_number in range(table["HomeTeam"].count()):
home_team = table.loc[row_number, "HomeTeam"]
away_team = table.loc[row_number, "AwayTeam"]
if team in [home_team, away_team]:
home_team_points = 0
away_team_points = 0
winner = table.loc[row_number, "FTR"]
if winner == 'H':
home_team_points = 3
elif winner == 'A':
away_team_points = 3
else:
home_team_points = 1
away_team_points = 1
if team == home_team:
points += home_team_points
else:
points += away_team_points
return points
def get_goals_scored_conceded(team, table):
scored = 0
conceded = 0
for row_number in range(table["HomeTeam"].count()):
home_team = table.loc[row_number, "HomeTeam"]
away_team = table.loc[row_number, "AwayTeam"]
if team in [home_team, away_team]:
if team == home_team:
scored += int(table.loc[row_number, "FTHG"])
conceded += int(table.loc[row_number, "FTAG"])
else:
scored += int(table.loc[row_number, "FTAG"])
conceded += int(table.loc[row_number, "FTHG"])
return (scored, conceded)
def compute_table(df):
raw_data = {"Team": [], "Points": [], "GoalDifference":[], "Goals": []}
for team in set_of_teams:
goal_data = get_goals_scored_conceded(team, df)
raw_data["Team"].append(team)
raw_data["Points"].append(calc_points(team, df))
raw_data["GoalDifference"].append(goal_data[0] - goal_data[1])
raw_data["Goals"].append(goal_data[0])
data_frame = pandas.DataFrame(data=raw_data)
data_frame = data_frame.sort_values(["Points", "GoalDifference", "Goals"], ascending=[False, False, False]).reset_index(drop=True)
data_frame.index = numpy.arange(1,len(data_frame)+1)
data_frame.index.names = ["Finish"]
return data_frame
def get_finish(team, table):
return table[table.Team==team].index.item()
def get_points(team, table):
return table[table.Team==team].Points.item()
def display_hbar(tables):
raw_data = {"Team": [], "Points": []}
for row_number in range(tables["Team"].count()):
raw_data["Team"].append(tables.loc[row_number+1, "Team"])
raw_data["Points"].append(int(tables.loc[row_number+1, "Points"]))
df = pandas.DataFrame(data=raw_data)
#df = pandas.DataFrame(tables, columns=["Team", "Points"])
print(df)
print(df.dtypes)
df["Points"].apply(int)
print(df.dtypes)
df.plot(kind='barh',x='Points',y='Team')
games = load_epl_games('epl2016.csv')
final_table = compute_table(games)
#print(final_table)
#print(get_finish("Tottenham", final_table))
#print(get_points("West Ham", final_table))
display_hbar(final_table)
The output:
Team Points
0 Chelsea 93
1 Tottenham 86
2 Man City 78
3 Liverpool 76
4 Arsenal 75
5 Man United 69
6 Everton 61
7 Southampton 46
8 Bournemouth 46
9 West Brom 45
10 West Ham 45
11 Leicester 44
12 Stoke 44
13 Crystal Palace 41
14 Swansea 41
15 Burnley 40
16 Watford 40
17 Hull 34
18 Middlesbrough 28
19 Sunderland 24
Team object
Points int64
dtype: object
Team object
Points int64
dtype: object
Traceback (most recent call last):
File "C:/Users/Michael/Documents/Programming/Python/Premier League.py", line 99, in <module>
display_hbar(final_table)
File "C:/Users/Michael/Documents/Programming/Python/Premier League.py", line 92, in display_hbar
df.plot(kind='barh',x='Points',y='Team')
File "C:\Program Files (x86)\Python36-32\lib\site- packages\pandas\plotting\_core.py", line 2941, in __call__
sort_columns=sort_columns, **kwds)
File "C:\Program Files (x86)\Python36-32\lib\site-packages\pandas\plotting\_core.py", line 1977, in plot_frame
**kwds)
File "C:\Program Files (x86)\Python36-32\lib\site-packages\pandas\plotting\_core.py", line 1804, in _plot
plot_obj.generate()
File "C:\Program Files (x86)\Python36-32\lib\site-packages\pandas\plotting\_core.py", line 258, in generate
self._compute_plot_data()
File "C:\Program Files (x86)\Python36-32\lib\site-packages\pandas\plotting\_core.py", line 373, in _compute_plot_data
'plot'.format(numeric_data.__class__.__name__))
TypeError: Empty 'DataFrame': no numeric data to plot
What am I doing wrong in my display_hbar function that is preventing me from plotting my data?
Here is the csv file

df.plot(x = "Team", y="Points", kind="barh");

You should swap x and y in df.plot(...). Because y must be numeric according to the pandas documentation.

how to convert multiple csv files to multiple tables in sqlite using python3?

I was trying to import multiple csv files into sqlite database into multiple tables(using jupyter notebook in python3). The name of each file will be the name of the table. I have defined a function to covert the encoding to utf8 as below:
import sqlite3
import glob
import csv
import sys
def convert_to_utf8(dirname):
for filename in glob.glob(os.path.join(dirname, '*.csv')):
ifp = open(filename, "rt", encoding='cp1252')
input_data = ifp.read()
ifp.close()
ofp = open(filename + ".fix", "wt", encoding='utf-8')
for c in input_data:
if c != '\0':
ofp.write(c)
ofp.close()
return
all the files are in the same folder. staging_dir_name_1 is where the files are. And I have below code to covert the csv file into tables, some of the codes are from similar questions in StackFlow:
convert_to_utf8(staging_dir_name_1)
conn = sqlite3.connect("medicare_hospital_compare_1.db")
c = conn.cursor()
for filename in glob.glob(os.path.join(staging_dir_name_1, '*.csv')):
with open(filename, "rb") as f:
data = csv.DictReader(f)
cols = data.fieldnames
tablename = os.path.splitext(os.path.basename(filename))[0]
sql_str = "drop table if exists %s" % tablename
c.execute(sql_str)
sql_str = "create table if not exists %s (%s)" % (tablename, ','.join(["%s text" % col for col in cols]))
c.execute(sql_str)
sql_str = "insert into %s values (%s)" % (tablename, ','.join(["?" for col in cols]))
c.executemany(sql_str, (list(map(row.get, cols)) for row in data))
conn.commit()
but when i run this i get this error
> Error Traceback (most recent call
> last) <ipython-input-29-be7c1f43e4c5> in <module>()
> 2 with open(filename, "rb") as f:
> 3 data = csv.DictReader(f)
> ----> 4 cols = data.fieldnames
> 5 tablename = os.path.splitext(os.path.basename(filename))[0]
> 6
>
> C:\Users\dupin\Anaconda3\lib\csv.py in fieldnames(self)
> 96 if self._fieldnames is None:
> 97 try:
> ---> 98 self._fieldnames = next(self.reader)
> 99 except StopIteration:
> 100 pass
>
> Error: iterator should return strings, not bytes (did you open the
> file in text mode?)
Could anyone help me on how to resolve this issue? I have been thinking about it for a while but still couldn't figure out how to resolve this.
**===UPDATE===**
Now i have changed 'rb' to 'rt', i got a new error full NULL values, i think the first function has already removed all the null values
Error Traceback (most recent call last)
<ipython-input-77-68d56c0b4cf2> in <module>()
3
4 data = csv.DictReader(f)
----> 5 cols = data.fieldnames
6 table = os.path.splitext(os.path.basename(filename))[0]
7
C:\Users\dupin\Anaconda3\lib\csv.py in fieldnames(self)
96 if self._fieldnames is None:
97 try:
---> 98 self._fieldnames = next(self.reader)
99 except StopIteration:
100 pass
Error: line contains NULL byte

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to read desired rows from large CSV files in python - python-3.x

Related

Iterate through excel files' sheets and append if sheet names share common part in Python

How to make my pandas script export to csv

IndexError multiprocessing.Pool

Cannot plot dataframe as barh because TypeError: Empty 'DataFrame': no numeric data to plot

how to convert multiple csv files to multiple tables in sqlite using python3?

Categories

Resources