Parse filename information into multiple columns in the concatenated csv file - python-3.x

I have multiple csv files in a folder and each has a unique file name such as W10N1_RTO_T0_1294_TL_IV_Curve.csv. I would like to concatenate all files together and create multiple columns based on the filename information. For example, W10N1 is one column called DieID.
I am a beginner on programming and Python. I couldn't figure how to do it easily.
import os
import glob
import pandas as pd
import csv
os.chdir('filepath')
extension='csv'
all_filenames=[i for i in glob.glob('*.{}'.format(extension))]
combined_csv=pd.concat([pd.read_csv(f) for f in all_filenames])
combined_csv.to_csv('combined_csv.csv',index=False

import os
os.listdir("your_target_direcotry")
will return a list of all files and directories in "your_target_direcotry".
Then it is just string manipulation. e.g
x = ‘blue_red_green’
x.split(“_”)
[‘blue’, ‘red’, ‘green’]
>>>
>>> a,b,c = x.split(“_”)
>>> a
‘blue’
>>> b
‘red’
>>> c
‘green’
Also do separate for "." first to remove .csv
At last, create a CSV which can operate by any separator u want.
f= open("yourfacnyname.csv","w+")
f.write("DieID You_fancy_other_IDs also_if_u_want_variable_use_this_%d\r\n" % (i+1))
f.close()
EZ as A B C

Related

How to change data type of a same column in 7000 csv file and replace the files with the updated ones

I have 7000 csv files with 30000 records in each file and all the files have same column headers and same column count i.e 14.
I want to change the data type of 1,3,4 column respectively which I can do by using pandas but my question how can I do it to all the files with out loading them one by one or you can say how can I achieve this using loop as I want to replace the same file with the updated columns?
I tried this code and honestly I have copied it from some other place so I don't know where to give the path of my csv files folder and how will it replace the files.
import pandas as pd
import os
import glob
def main():
path = os.getcwd()
csv_files = glob.glob(os.path.join(path, "*.csv"))
for f in csv_files:
df = pd.read_csv(f)
df[['LOAD DATE','DATE OF ISSUE','DATE OF DEPARTURE']] = df[['LOAD DATE','DATE OF ISSUE','DATE OF DEPARTURE']].apply(pd.to_datetime, errors='coerce')
df.to_csv(f, index=False)
main()

How to read files from folder based on column value of dataframe

I have column with some numbers , for each number i want to check in folder if this match to any file name in folder read this file ,if not match number go for next ...
df=pd.DataFrame({'x':['2000','5000','10000']})
files_folder:
P2000.csv
P4000.csv
P5000.csv
P6000.csv
P4000.csv
result:
read files :
P2000.csv
P5000.csv
Use glob with test substring in any with list comprehension:
import glob
df=pd.DataFrame({'x':['2000','5000','10000']})
for f in glob.glob('files_folder/*.csv'):
if any([x in f for x in df['x']]):
print (f)
files_folder\P2000.csv
files_folder\P5000.csv
List comprehension:
files = [f for f in glob.glob('files_folder/*.csv') if any([x in f for x in df['x']])]
print (files)
['files_folder\\P2000.csv', 'files_folder\\P5000.csv']
You can use glob.glob() to list all csv files in your files_folder. Then use apply() to check if value in x is in those filename list.
import glob
import numpy as np
files = glob.glob("/path/to/*.csv")
files = df['x'].apply(lambda x: f'P{x}.csv' if any([f'P{x}' in k for k in files]) else np.nan ).dropna().tolist()
print(files)
['P2000.csv', 'P5000.csv']

Python merging excel files in directory

I have thousands of files inside a directory with this pattern YYYY/MM/DD/HH/MM:
201901010000.xlsx
201901010001.xlsx,
201901010002.xlsx,
201801010000.xlsx,
201801010001.xlsx,
201801010002.xlsx,
I wants to merge file by begin with same YYYY(2018 & 2019 separate file) wise into one excel file.like below
this is first file
201901010000.xlsx,
201901010001.xlsx,
201901010002.xlsx,
this is second file
201801010000.xlsx,
201801010001.xlsx,
201801010002.xlsx,
You will need to parse each file and concatenate by pandas:
import pandas as pd
import glob
my_path = "c:\\temp\\"
for year in ['2008', '2009']:
buf = []
year_files = glob.glob(my_path + year+"*.xlsx")
for file in year_files:
df = pd.read_excel(file)
buf.append(df)
year_df = pd.concat(buf)
year_df.to_excel(year+".xlsx")

How to combine multiple csv files based on file name

I have more than 1000 csv files , i want to combine where csv filename first five digits are same in to one csv file.
input:
100044566.csv
100040457.csv
100041458.csv
100034566.csv
100030457.csv
100031458.csv
100031459.csv
import pandas as pd
import os
import glob
path_1 =''
all_files_final = glob.glob(os.path.join(path_1, "*.csv"))
names_1 = [os.path.basename(x1) for x1 in all_files_final]
final = pd.DataFrame()
for file_1, name_1 in zip(all_files_final, names_1):
file_df_final = pd.read_csv(file_1,index_col=False)
#file_df['file_name'] = name
final = final.append(file_df_final)
final.to_csv('',index=False)
i used the above code but its merging all files in to one csv file , i dont know have to make selection based on the name
so from above input
output 1: combine first three csv files in one csv file because filename first five digits are same.
output 2: combine next 4 files in one csv files because filename first five digits are same.
I would recommend you to approach the problem slightly differently.
Here's my solution:
import os
import pandas as pd
files = os.listdir('.') # returns list of filenames in current folder
files_of_interest = {} # a dictionary that we will be using in future
for filename in files: # iterate over files in a folder
if filename[-4:] == '.csv': # check whether a file is of .csv format
key = filename[:5] # as you've mentioned in you question - first five characters of filename is of interest
files_of_interest.setdefault(key,[]) #if we dont have such key - .setdefault will create such key for us and assign empy list to it
files_of_interest[key].append(filename) # append to a list new filename
for key in files_of_interest:
buff_df = pd.DataFrame()
for filename in files_of_interest[key]:
buff_df= buff_df.append(pd.read_csv(filename)) # iterate over every filename for specific key in dictionary and appending it to buff_df
files_of_interest[key]=buff_df # replacing list of files by a data frame
This code will create a dictionary of dataframes. Where keys of the dictionary will be a set of first unique characters of .csv files.
Then you can iterate over keys of the dictionary to save every according dataframe as a .csv file.
Hope my answer helped.

CSV to Pythonic List

I'm trying to convert a CSV file into Python list I have strings organize in columns. I need an Automation to turn them into a list.
my code works with Pandas, but I only see them again as simple text.
import pandas as pd
data = pd.read_csv("Random.csv", low_memory=False)
dicts = data.to_dict().values()
print(data)
so the final results should be something like that : ('Dan', 'Zac', 'David')
You can simply do this by using csv module in python
import csv
with open('random.csv', 'r') as f:
reader = csv.reader(f)
your_list = map(list, reader)
print your_list
You can also refer here
If you really want a list, try this:
import pandas as pd
data = pd.read_csv('Random.csv', low_memory=False, header=None).iloc[:,0].tolist()
This produces
['Dan', 'Zac', 'David']
If you want a tuple instead, just cast the list:
data = tuple(pd.read_csv('Random.csv', low_memory=False, header=None).iloc[:,0].tolist())
And this produces
('Dan', 'Zac', 'David')
I assumed that you use commas as separators in your csv and your file has no header. If this is not the case, just change the params of read_csv accordingly.

Resources