I have a folder with a lot of csv files, I want move column A to C, leave column A empty and push all other columns to the right.
I tried looking for something similar, but all other examples I saw refer to specific csv files and not an iteration over a folder.
thank you,
Here, the code to iterate over csvfile on your folder. Fill ellipsis with your code:
import pathlib
import shutil
root_dir = pathlib.Path('your_directory_path_here')
for csvfile in root_dir.glob('*.csv'):
df = pd.read_csv(csvfile, ...) # read your csv
# modify the order of your column here
...
shutil.copyfile(csvfile, f"{csvfile}.bak") # backup your csv
df.to_csv(csvfile, ...) # write back your dataframe
Related
I have multiple CSVs with same data structures in all these.
All i want is to append all these in one or to create a separate master CSV in which all the records from all files will be stored.
Note: I don't want to use any of the dataframe like pandas or dask or any.
Can somebody help me out?
Thanks
# append option
for csv_file in csv_files:
with open(csv_file, 'r') as f:
data = f.readlines()[1:]
with open(master_file, 'a') as f:
f.write(data)
Note- You will need to add header line to the master_csv before loop
I am reading from the URL
import pandas as pd
url = 'https://ticdata.treasury.gov/Publish/mfh.txt'
Data = pd.read_csv(url, delimiter= '\t')
But when I export the Dataframe, I see all the columns are combined in a single column
I tried with different separators but didn't work. I want to get the Proper data frame. How I can achieve it.
Please help
I just opened this file in a text editor. This is not a tab delimited file, or anything delimited file. This is a fixed width fields files from line 9 to 48. You should use pd.read_fwf instead skiping some lines.
This would work:
import pandas as pd
url = 'https://ticdata.treasury.gov/Publish/mfh.txt'
Data = pd.read_fwf(url, widths=(31,8,8,8,8,8,8,8,8,8,8,8,8,8), skiprows=8, skipfooter=16, header=(0,1))
Data
The file you're linking is not in the CSV/TSV format. You need to transform the data to look something like this before loading it in this way.
I am looking a way to read the dataset directly from UCI Machine Learning repository. But i am only able to get dataset.. not its description.
Here is the link https://archive.ics.uci.edu/ml/datasets/Car+Evaluation and https://archive.ics.uci.edu/ml/machine-learning-databases/car/ to the data I want to import.
The files are .data and .names.
How do you import them into Python as a data frame?
I have tried as below.. where i have to manually write the features or column names. Is there a way we can read .names file and set the features from there.
Creating the feature names manually might be ok for dataset with handful of features.. but as features grow it will be hard to do it manually.
# Without Column Names
df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data', header=None)
# Generating Column Name manually.
names=[ 'buying','maint','doors','persons','lug_boot','safety','class']
df2 = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data', names = names)
Any help, will be appreciated.
Thanks.
.names files are unstructered, unfortunately for this reason you would have to open the file and extract the column names manually. Once you do so, you could add these names into a list. Given that you have multiple .data files and that these are in the same order, you could use a for loop to label the column names and read the datafiles simultaneously.
column_names = ["example1", "example2", "example3"]
data_list =[]
data = ["link to the sourcefile/file.data", "link to the
sourcefile/file.data", "link to the sourcefile/file.data"]
for file in data:
df = pd.read_csv(file, names = column_names)
data_list.append(df)
My CSV file contains "" which ruins the file, when I import using Pandas, it considers that all columns as one value.
what I want to make is to change the following value in the column
4.7,3.2,1.3,.2,"Setosa"
to
4.7,3.2,1.3,.2,'Setosa'
Can't you use something like below?
string.replace('"', "'")
Also how to extract only 30 .txt files from each folder?
Being a beginner it is very easy to read CSV files but I'm not able to succeed in this approach.
For starter, you could create an empty DataFrame and setup the required columns and data type for the respective. Then, it would be better to set the name for the 30 .txt files to be sequential, so that you could loop through the text and load each of them. Do the same for the folder names, or create a list of paths and traverse through the list.
In addition to that, you might want to change the format to .csv instead, and use comma-separated values to specify the data.
For every text file, append the data to the DataFrame that you made in the beginning.
The pseudo-code for that would be something like this
# create DataFrame
# add column and set data type
# create list of folder paths
# for every folder:
# for i in range(30):
# read data from "text" + str(i) + ".csv"
# append data to DataFrame