I have multiple csv files in a folder and each has a unique file name such as W10N1_RTO_T0_1294_TL_IV_Curve.csv. I would like to concatenate all files together and create multiple columns based on the filename information. For example, W10N1 is one column called DieID.
I am a beginner on programming and Python. I couldn't figure how to do it easily.
import os
import glob
import pandas as pd
import csv
all_filenames=[i for i in glob.glob('*.{}'.format(extension))]
combined_csv=pd.concat([pd.read_csv(f) for f in all_filenames])

import os
will return a list of all files and directories in "your_target_direcotry".
Then it is just string manipulation. e.g
x = ‘blue_red_green’
[‘blue’, ‘red’, ‘green’]
>>> a,b,c = x.split(“_”)
>>> a
>>> b
>>> c
Also do separate for "." first to remove .csv
At last, create a CSV which can operate by any separator u want.
f= open("yourfacnyname.csv","w+")
f.write("DieID You_fancy_other_IDs also_if_u_want_variable_use_this_%d\r\n" % (i+1))
EZ as A B C


How to change data type of a same column in 7000 csv file and replace the files with the updated ones

I have 7000 csv files with 30000 records in each file and all the files have same column headers and same column count i.e 14.
I want to change the data type of 1,3,4 column respectively which I can do by using pandas but my question how can I do it to all the files with out loading them one by one or you can say how can I achieve this using loop as I want to replace the same file with the updated columns?
I tried this code and honestly I have copied it from some other place so I don't know where to give the path of my csv files folder and how will it replace the files.
import pandas as pd
import os
import glob
def main():
path = os.getcwd()
csv_files = glob.glob(os.path.join(path, "*.csv"))
for f in csv_files:
df = pd.read_csv(f)
df[['LOAD DATE','DATE OF ISSUE','DATE OF DEPARTURE']] = df[['LOAD DATE','DATE OF ISSUE','DATE OF DEPARTURE']].apply(pd.to_datetime, errors='coerce')
df.to_csv(f, index=False)

How to read files from folder based on column value of dataframe

I have column with some numbers , for each number i want to check in folder if this match to any file name in folder read this file ,if not match number go for next ...
read files :
Use glob with test substring in any with list comprehension:
import glob
for f in glob.glob('files_folder/*.csv'):
if any([x in f for x in df['x']]):
print (f)
List comprehension:
files = [f for f in glob.glob('files_folder/*.csv') if any([x in f for x in df['x']])]
print (files)
['files_folder\\P2000.csv', 'files_folder\\P5000.csv']
You can use glob.glob() to list all csv files in your files_folder. Then use apply() to check if value in x is in those filename list.
import glob
import numpy as np
files = glob.glob("/path/to/*.csv")
files = df['x'].apply(lambda x: f'P{x}.csv' if any([f'P{x}' in k for k in files]) else np.nan ).dropna().tolist()
['P2000.csv', 'P5000.csv']

Python merging excel files in directory

I have thousands of files inside a directory with this pattern YYYY/MM/DD/HH/MM:
I wants to merge file by begin with same YYYY(2018 & 2019 separate file) wise into one excel below
this is first file
this is second file
You will need to parse each file and concatenate by pandas:
import pandas as pd
import glob
my_path = "c:\\temp\\"
for year in ['2008', '2009']:
buf = []
year_files = glob.glob(my_path + year+"*.xlsx")
for file in year_files:
df = pd.read_excel(file)
year_df = pd.concat(buf)

How to combine multiple csv files based on file name

I have more than 1000 csv files , i want to combine where csv filename first five digits are same in to one csv file.
import pandas as pd
import os
import glob
path_1 =''
all_files_final = glob.glob(os.path.join(path_1, "*.csv"))
names_1 = [os.path.basename(x1) for x1 in all_files_final]
final = pd.DataFrame()
for file_1, name_1 in zip(all_files_final, names_1):
file_df_final = pd.read_csv(file_1,index_col=False)
#file_df['file_name'] = name
final = final.append(file_df_final)
i used the above code but its merging all files in to one csv file , i dont know have to make selection based on the name
so from above input
output 1: combine first three csv files in one csv file because filename first five digits are same.
output 2: combine next 4 files in one csv files because filename first five digits are same.
I would recommend you to approach the problem slightly differently.
Here's my solution:
import os
import pandas as pd
files = os.listdir('.') # returns list of filenames in current folder
files_of_interest = {} # a dictionary that we will be using in future
for filename in files: # iterate over files in a folder
if filename[-4:] == '.csv': # check whether a file is of .csv format
key = filename[:5] # as you've mentioned in you question - first five characters of filename is of interest
files_of_interest.setdefault(key,[]) #if we dont have such key - .setdefault will create such key for us and assign empy list to it
files_of_interest[key].append(filename) # append to a list new filename
for key in files_of_interest:
buff_df = pd.DataFrame()
for filename in files_of_interest[key]:
buff_df= buff_df.append(pd.read_csv(filename)) # iterate over every filename for specific key in dictionary and appending it to buff_df
files_of_interest[key]=buff_df # replacing list of files by a data frame
This code will create a dictionary of dataframes. Where keys of the dictionary will be a set of first unique characters of .csv files.
Then you can iterate over keys of the dictionary to save every according dataframe as a .csv file.
Hope my answer helped.

CSV to Pythonic List

I'm trying to convert a CSV file into Python list I have strings organize in columns. I need an Automation to turn them into a list.
my code works with Pandas, but I only see them again as simple text.
import pandas as pd
data = pd.read_csv("Random.csv", low_memory=False)
dicts = data.to_dict().values()
so the final results should be something like that : ('Dan', 'Zac', 'David')
You can simply do this by using csv module in python
import csv
with open('random.csv', 'r') as f:
reader = csv.reader(f)
your_list = map(list, reader)
print your_list
You can also refer here
If you really want a list, try this:
import pandas as pd
data = pd.read_csv('Random.csv', low_memory=False, header=None).iloc[:,0].tolist()
This produces
['Dan', 'Zac', 'David']
If you want a tuple instead, just cast the list:
data = tuple(pd.read_csv('Random.csv', low_memory=False, header=None).iloc[:,0].tolist())
And this produces
('Dan', 'Zac', 'David')
I assumed that you use commas as separators in your csv and your file has no header. If this is not the case, just change the params of read_csv accordingly.
