Removing levels from data frame read from CSV file - R

Removing levels from data frame read from CSV file - R - string

I tried loading the baseball statistics from this link. When I read it from the file using
data <- read.csv("MLB2011.csv")
it seems to be reading all fields as factor values. I tried dropping those factor values by doing:
read.csv("MLB2011.xls", as.is= FALSE)
.. but it looks like the values are still being read as factors. What can I do to have them loaded as simple character values and not factors?

You aren't reading a csv file, it is an excel spreadsheet (.xls format). It contains two worksheets bat2011 and pitch2011
You could use the XLConnect library to read this
library(XLConnect)
# load the work book (connect to the file)
wb <- loadWorkbook("MLB2011.xls")
# read in the data from the bat2011 sheet
bat2011 <- readWorksheet(wb, sheet = 'bat2011')
readWorksheet has an argument colType which you could use to specify the column types.
Edit
If you have already saved the sheets as csv files then
as.is = TRUE or stringsAsFactors = FALSE will be the correct argument values

Related

Custom Filepath Exporting a Pandas Dataframe

I am working with financial data, and I am cleaning the data in Python before exporting it as a CSV. I want this file to be reused, so I want to make sure that the exported files are not overwritten. I am including this piece of code to help with this:
# Fill this out; this will help identify the dataset after it is exported
latestFY = '21'
earliestFY = '19'
I want the user to change the earliest and latest fiscal year variables to reflect the data they are working with, so when the data is exported, it is called financialData_FY19_FY21, for example. How can I do this using the to_csv function?
Here is what I currently have:
mergedDF.to_csv("merged_financial_data_FY.csv", index = False)
Here is what I want the file path to look like: financialData_FY19_FY21 where the 19 and 21 can be changed based on the input above.

You can use an f-string to update the strings that will be your file paths.
latestFY = '21'
earliestFY = '19'
filename = f"merged_financial_data_FY{earliestFY}_{latestFY}.csv"
mergedDF.to_csv(filename, index=False)
Link to docs

Appending data from multiple excel files into a single excel file without overwriting using python pandas

Here is my current code below.
I have a specific range of cells (from a specific sheet) that I am pulling out of multiple (~30) excel files. I am trying to pull this information out of all these files to compile into a single new file appending to that file each time. I'm going to manually clean up the destination file for the time being as I will improve this script going forward.
What I currently have works fine for a single sheet but I overwrite my destination every time I add a new file to the read in list.
I've tried adding the mode = 'a' and a couple different ways to concat at the end of my function.
import pandas as pd
def excel_loader(fname, sheet_name, new_file):
xls = pd.ExcelFile(fname)
df1 = pd.read_excel(xls, sheet_name, nrows = 20)
print(df1[1:15])
writer = pd.ExcelWriter(new_file)
df1.insert(51, 'Original File', fname)
df1.to_excel(new_file)
names = ['sheet1.xlsx', 'sheet2.xlsx']
destination = 'destination.xlsx'
for name in names:
excel_loader(name, 'specific_sheet_name', destination)
Thanks for any help in advance can't seem to find an answer to this exact situation on here. Cheers.

Ideally you want to loop through the files and read the data into a list, then concatenate the individual dataframes, then write the new dataframe. This assumes the data being pulled is the same size/shape and the sheet name is the same. If sheet name is changing, look into zip() function to send filename/sheetname tuple.
This should get you started:
names = ['sheet1.xlsx', 'sheet2.xlsx']
destination = 'destination.xlsx'
#read all files first
df_hold_list = []
for name in names:
xls = pd.ExcelFile(name)
df = pd.read_excel(xls, sheet_name, nrows = 20)
df_hold_list.append(df)
#concatenate dfs
df1 = pd.concat(df_hold_list, axis=1) # axis is 1 or 0 depending on how you want to cancatenate (horizontal vs vertical)
#write new file - may have to correct this piece - not sure what functions these are
writer = pd.ExcelWriter(destination)
df1.to_excel(destination)

Pandas Copy Values from Rows to other files without disturbing the existing data

I have 20 csv files pertaining to different individuals.
And I have a Main csv file, which is based on the final row values in specific columns. Below are the sample for both kinds of files.
All Individual Files look like this:
alex.csv
name,day,calls,closed,commision($)
alex,25-05-2019,68,6,15
alex,27-05-2019,71,8,20
alex,28-05-2019,65,7,17.5
alex,29-05-2019,68,8,20
stacy.csv
name,day,calls,closed,commision($)
stacy,25-05-2019,82,16,56.00
stacy,27-05-2019,76,13,45.50
stacy,28-05-2019,80,19,66.50
stacy,29-05-2019,79,18,63.00
But the Main File(single day report), which is the output file, looks like this:
name,day,designation,calls,weekly_avg_calls,closed,commision($)
alex,29-05-2019,rep,68,67,8,20
stacy,29-05-2019,sme,79,81,18,63
madhu,29-05-2019,rep,74,77,16,56
gabrielle,29-05-2019,rep,59,61,6,15
I require to copy the required values from the columns(calls,closed,commision($)) of the last line, for end-of-today's report, and then populate it to the Main File(template that already has some columns filled like the {name,day,designation....}).
And so, how can I write a for or a while program, for all the csv files in the "Employee_performance_DB" list.
Employee_performance_DB = ['alex.csv', 'stacy.csv', 'poduzav.csv', 'ankit.csv' .... .... .... 'gabrielle.csv']
for employee_db in Employee_performance_DB:
read_object = pd.read_csv(employee_db)
read_object2 = read_object.tail(1)
read_object2.to_csv("Main_Report.csv", header=False, index=False, columns=["calls", "closed", "commision($)"], mode='a')
How to copy values of {calls,closed,commision($)} from the 'Employee_performance_DB' list of files to the exact column in the 'Main_Report.csv' for those exact empolyees?

Well, as I had no answers for this, it took a while for me to find a solution.
The code below fixed my issue...
# Created a list of all the files in "employees_list"
employees_list = ['alex.csv', ......, 'stacy.csv']
for employees in employees_list:
read_object = pd.read_csv(employees)
read_object2 = read_object.tail(1)
read_object2.to_csv("Employee_performance_DB.csv", index=False, mode='a', header=False)

R data.table fwrite irregularity

I'm attempting to write below dataframe to a csv and excel file.
df = data.frame(a = 1:4,b = c("TRUE","FALSE","true","false"))
data.table::fwrite(df,"test_csv.csv",quote = T)
openxlsx::write.xlsx(df,"test_excel.xlsx")
Upon opening the files in Microsoft Excel, I observe that case in 2nd column has automatically changed in CSV file(while data in .xlsx file is in order). Why does it happen and how do I avoid CSV file data get erroneously changed?
Please refer to below screenshot for more clarity

Reading multiple Excel files, averaging, and writing to a single Excel file in Matlab

I am trying to expand some code I've written. It might be useful to include that script below:
% importing single excel sheet
data = xlsread('test_file.xlsx');
% averaging durations (exluding NaNs)
average_rx_time = mean(reaction_time, 'omitnan');
average_reach_dur = mean(reach_dur, 'omitnan');
average_lift_dur = mean(lift_dur, 'omitnan');
average_hold_dur = mean(hold_dur, 'omitnan');
average_withdrawal_dur = mean(withdrawal_dur, 'omitnan');
% Excel file output containing daily averages
a = [average_rx_time, average_reach_dur, average_lift_dur, average_hold_dur, average_withdrawal_dur];
data_cells = num2cell(a);
column_headers ={'Rx Time', 'Reach Dur', 'Lift Dur', 'Hold Dur', 'Withdrawal Dur'};
row_headers(1,1) ={'Day 1'};
output = [{' '} column_headers; row_headers data_cells];
xlswrite('Test.xls', output);
This portion works. It reads a bunch of values in a single Excel sheet, averages some numbers, then prints those averages to another Excel sheet. What I need to do now is read several files from a directory (they all exist in one folder and are the same file type), average the same values in each file, then print them with their respective file name in the spreadsheet.
I think I should use a loop, but I'm not sure where to implement it. I'm also not sure how to read multiple Excel files while printing to the same one. Any ideas?
Thanks,
Mickey

Use the cd and dir functions to get a list of files in a specified folder:
cd('\path\to\folder'); % set MATLAB's current folder
fileList = dir('*.xlsx'); % get list of .xlsx files in a struct array
for fileNum = 1:length(fileList)
fileName = fileList(fileNum).name;
% do whatever you want with fileName here
end

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Removing levels from data frame read from CSV file - R - string

Related

Custom Filepath Exporting a Pandas Dataframe

Appending data from multiple excel files into a single excel file without overwriting using python pandas

Pandas Copy Values from Rows to other files without disturbing the existing data

R data.table fwrite irregularity

Reading multiple Excel files, averaging, and writing to a single Excel file in Matlab

Categories

Resources