Python. Append CSV files in a folder into one big file - python-3.x

I am a little confused with a Pandas library and would really appreciate your help.
The task is to combine all *.csv files in a folder into one big file.
CSV files don't have a header, so I just want to append all of them and add a header in the end.
Here is the code I use.
The final file is "ALBERTA GENERAL", in the beginning I delete the old one before creating an updated version.
os.chdir(dataFolder)
with io.open("ALBERTA GENERAL.csv", "w+", encoding='utf8') as f:
os.remove("ALBERTA GENERAL.csv")
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
combined_csv = pd.concat([pd.read_csv(f, error_bad_lines=False) for f in all_filenames], axis=0, ignore_index = True)
print(combined_csv)
with io.open('ALBERTA GENERAL.csv', "w+", encoding='utf8') as outcsv:
writer = csv.DictWriter(outcsv, fieldnames=["Brand, Name, Strain, Genre, Product type, Date"],delimiter=";")
writer.writeheader()
combined_csv.to_csv(outcsv, index=False, encoding='utf-8-sig')
But I get a confusing result that I don't know how to fix.
The final file doesn't append intermediate files one below another, but it adds columns for the next file. I tried to add the same headers to the intermediate files but it did not help.
Other than that the header is not split by columns and is recognized as one line.
Can anyone help me to fix my code, please?
Here is the link to the files

Just to fix the irregularities of the first file:
with open('ALBERTA GENERAL.csv','r') as f_in, open('ALBERTA GENERAL_fixed.csv','w') as f_out:
for line in f_in:
line = line.replace(',',';')
line = line.strip().rstrip(';')
line = line.strip().lstrip(';')
f_out.write(line + '\n')
os.remove('ALBERTA_GENERAL.csv')
We will import the first file separately because it has different requirements than the others:
df1 = pd.read_csv('ALBERTA GENERAL_fixed.csv',header=0,sep=';')
We can then do the other two:
df2 = pd.read_csv('file_ALBERTA_05.14.2020.csv',header=None,sep=';')
df3 = pd.read_csv('file_ALBERTA_05.18.2020.csv',header=None,sep=';')
df2.columns = df1.columns
df3.columns = df1.columns
Final steps:
combined = pd.concat([df1,df2,df3])
combined.to_csv('out.csv',index=False)

Related

Compare 2 CSV files (encoded = "utf8") keeping data format

I have 2 stock lists (New and Old). How can I compare it to see what items have been added and what had been removed (happy to add them to 2 different files added and removed)?
so far I have tired along the lines of looking row by row.
import csv
new = "new.csv"
old = "old.csv"
add_file = "add.csv"
remove_file = "remove.csv"
with open(new,encoding="utf8") as new_read, open(old,encoding="utf8") as old_read:
new_reader = csv.DictReader(new_read)
old_reader = csv.DictReader(old_read)
for new_row in new_reader :
for old_row in old_reader:
if old_row["STOCK CODE"] == new_row["STOCK CODE"]:
print("found")
This works for 1 item. if I add an *else: * it just keeps printing that until its found. So it's not an accurate way of comparing the files.
I have 5k worth of rows.
There must be a better way to add the differences to the 2 different files and keep the same data structure at the same time ?
N.B i have tired this link Python : Compare two csv files and print out differences
2 minor issues:
1. the data structure is not kept
2. there is not reference to the change of location
You could just read the data into memory and then compare.
I used sets for the codes in this example for faster lookup.
import csv
def get_csv_data(file_name):
data = []
codes = set()
with open(file_name, encoding="utf8") as csv_file:
reader = csv.DictReader(csv_file)
for row in reader:
data.append(row)
codes.add(row['STOCK CODE'])
return data, codes
def write_csv(file_name, data, codes):
with open(file_name, 'w', encoding="utf8", newline='') as csv_file:
headers = list(data[0].keys())
writer = csv.DictWriter(csv_file, fieldnames=headers)
writer.writeheader()
for row in data:
if row['STOCK CODE'] not in codes:
writer.writerow(row)
new_data, new_codes = get_csv_data('new.csv')
old_data, old_codes = get_csv_data('old.csv')
write_csv('add.csv', new_data, old_codes)
write_csv('remove.csv', old_data, new_codes)

How to skip entire file when iterating through folder?

I have a file path with many files in it.
Some of the files dont have the data I want in them, how do I skip over these files and move on to the next set of files?
path ='/path/' # use your path
allFiles = glob.glob(path + "/*.json")
for file_ in allFiles:
#print(file_)
with open(file_) as f:
data = json.load(f)
df = json_normalize(data['col_to_be_flattened'])
REST OF THE OPERATIONS
once the data is in the dataframe at the point df, the REST OF THE OPERATIONS relies on a column called 'Rows.Row', if this column does not exist in df I want to skip it. How do I do this?
Just check if 'Rows.Row' is in the title of the columns before continue.
path ='/path/' # use your path
allFiles = glob.glob(path + "/*.json")
for file_ in allFiles:
#print(file_)
with open(file_) as f:
data = json.load(f)
df = json_normalize(data['col_to_be_flattened'])
if 'Rows.Row' in df.columns.tolist():
REST OF THE OPERATIONS

Extract numbers and text from csv file with Python3.X

I am trying to extract data from a csv file with python 3.6.
The data are both numbers and text (it's url addresses):
file_name = [-0.47, 39.63, http://example.com]
On multiple forums I found this kind of code:
data = numpy.genfromtxt(file_name, delimiter=',', skip_header=skiplines,)
But this works for numbers only, the url addresses are read as NaN.
If I add dtype:
data = numpy.genfromtxt(file_name, delimiter=',', skip_header=skiplines, dtype=None)
The url addresses are read correctly, but they got a "b" at the beginning of the address, such as:
b'http://example.com'
How can I remove that? How can I just have the simple string of text?
I also found this option:
file = open(file_path, "r")
csvReader = csv.reader(file)
for row in csvReader:
variable = row[i]
coordList.append(variable)
but it seems it has some issues with python3.

How to write into a CSV file with Python

Background:
I have a CSV (csv_dump) file with data from a MySQL table. I want copy some of the lines that meet certain conditions (row[1] == condition_1 and row[2] == condition_2) into a temporary CSV file (csv_temp).
Code Snippet:
f_reader = open(csv_dump, 'r')
f_writer = open(csv_temp, 'w')
temp_file = csv.writer(f_writer)
lines_in_csv = csv.reader(f_reader, delimiter=',', skipinitialspace=False)
for row in lines_in_csv:
if row[1] == condition_1 and row[2] == condition_2:
temp_file.writerow(row)
f_reader.close()
f_writer.close()
Question:
How can I copy the line that is being read copy it "as is" into the temp file with Python3?
test.csv
data1,data2,data3
120,80,200
140,50,210
170,100,250
150,70,300
180,120,280
The code goes here
import csv
with open("test.csv", 'r') as incsvfile:
input_csv = csv.reader(incsvfile, delimiter=',', skipinitialspace=False)
with open('tempfile.csv', 'w', newline='') as outcsvfile:
spamwriter = csv.writer(outcsvfile, delimiter=',', quoting=csv.QUOTE_MINIMAL)
first_row = next(input_csv)
spamwriter.writerow(first_row)
for row in input_csv:
if int(row[1]) != 80 and int(row[2]) != 300:
spamwriter.writerow(row)
output tempfile.csv
data1,data2,data3
140,50,210
170,100,250
180,120,280
if you don't have title remove these two lines
first_row = next(input_csv)
spamwriter.writerow(first_row)
The following Python script seems to do the job... However, having said that, you should probably be using a MySQL query to do this work directly, instead of re-processing from an intermediate CSV file. But I guess there must be some good reason for wanting to do that?
mycsv.csv:
aa,1,2,5
bb,2,3,5
cc,ddd,3,3
hh,,3,1
as,hfd,3,3
readwrite.py:
import csv
with open('mycsv.csv', 'rb') as infile:
with open('out.csv', 'wb') as outfile:
inreader = csv.reader(infile, delimiter=',', quotechar='"')
outwriter = csv.writer(outfile)
for row in inreader:
if row[2]==row[3]:
outwriter.writerow(row)
out.csv:
cc,ddd,3,3
as,hfd,3,3
With a little more work, you could change the 'csv.writer' to make use of the same delimiters and escape/quote characters as the 'csv.reader'. It's not exactly the same as writing out the raw line from the file, but I think it will be practically as fast since the lines in question have already clearly been parsed without error if we have been able to check the value of specific fields.

combining results from two different cursors and then writing to a csv file in python 3

I am new to Python and I am working on a script that generates a csv report that writes data from the database, when given an ID as input. It works fine with one cursor.Now I have two different databases and I want to generate a single report that combines the results of both cursors. How do I combine the results from each cursor horizontally ? Is that possible in python 3? Please give me some suggestions. Here is the code I am working which involves one cursor:
cur = conn.cursor()
cur.execute("Select * from FailureAnalysisResults where LotName = ? and TestResultID = ?", (lot_name, testResultID))
with open(csvfile, 'w', newline='') as fout:
writer = csv.writer(fout, delimiter=',', quotechar=' ', quoting=csv.QUOTE_MINIMAL)
writer.writerow([i[0] for i in cur.description]) # heading row
writer.writerows(cur.fetchall())
I want to do the above for another database and combine the results of both the cursors before writing it to the csv file. I tried checking out arrays but I am stuck and need some suggestions. Thank you.
Well , I could achieve the above asked with the following code snippet:
list1 = list(cur1)
list2 = list(cur2)
list3 = list(zip(list1, list2))
with open(csvfile, 'w', newline='') as fout:
writer = csv.writer(fout, delimiter=',', quotechar=' ', quoting=csv.QUOTE_MINIMAL)
writer.writerow([i[0] for i in cur1.description] + [j[0] for j in cur2.description]) # heading row
writer.writerows(list3)
The above code concatenates well but I have formatting issues in the.csv file generated. It also prints '(',',',')' in the csv file which is causing some formatting issues.

Resources