I'm attempting to write below dataframe to a csv and excel file.
df = data.frame(a = 1:4,b = c("TRUE","FALSE","true","false"))
data.table::fwrite(df,"test_csv.csv",quote = T)
openxlsx::write.xlsx(df,"test_excel.xlsx")
Upon opening the files in Microsoft Excel, I observe that case in 2nd column has automatically changed in CSV file(while data in .xlsx file is in order). Why does it happen and how do I avoid CSV file data get erroneously changed?
Please refer to below screenshot for more clarity
Related
Here is my current code below.
I have a specific range of cells (from a specific sheet) that I am pulling out of multiple (~30) excel files. I am trying to pull this information out of all these files to compile into a single new file appending to that file each time. I'm going to manually clean up the destination file for the time being as I will improve this script going forward.
What I currently have works fine for a single sheet but I overwrite my destination every time I add a new file to the read in list.
I've tried adding the mode = 'a' and a couple different ways to concat at the end of my function.
import pandas as pd
def excel_loader(fname, sheet_name, new_file):
xls = pd.ExcelFile(fname)
df1 = pd.read_excel(xls, sheet_name, nrows = 20)
print(df1[1:15])
writer = pd.ExcelWriter(new_file)
df1.insert(51, 'Original File', fname)
df1.to_excel(new_file)
names = ['sheet1.xlsx', 'sheet2.xlsx']
destination = 'destination.xlsx'
for name in names:
excel_loader(name, 'specific_sheet_name', destination)
Thanks for any help in advance can't seem to find an answer to this exact situation on here. Cheers.
Ideally you want to loop through the files and read the data into a list, then concatenate the individual dataframes, then write the new dataframe. This assumes the data being pulled is the same size/shape and the sheet name is the same. If sheet name is changing, look into zip() function to send filename/sheetname tuple.
This should get you started:
names = ['sheet1.xlsx', 'sheet2.xlsx']
destination = 'destination.xlsx'
#read all files first
df_hold_list = []
for name in names:
xls = pd.ExcelFile(name)
df = pd.read_excel(xls, sheet_name, nrows = 20)
df_hold_list.append(df)
#concatenate dfs
df1 = pd.concat(df_hold_list, axis=1) # axis is 1 or 0 depending on how you want to cancatenate (horizontal vs vertical)
#write new file - may have to correct this piece - not sure what functions these are
writer = pd.ExcelWriter(destination)
df1.to_excel(destination)
I have 20 csv files pertaining to different individuals.
And I have a Main csv file, which is based on the final row values in specific columns. Below are the sample for both kinds of files.
All Individual Files look like this:
alex.csv
name,day,calls,closed,commision($)
alex,25-05-2019,68,6,15
alex,27-05-2019,71,8,20
alex,28-05-2019,65,7,17.5
alex,29-05-2019,68,8,20
stacy.csv
name,day,calls,closed,commision($)
stacy,25-05-2019,82,16,56.00
stacy,27-05-2019,76,13,45.50
stacy,28-05-2019,80,19,66.50
stacy,29-05-2019,79,18,63.00
But the Main File(single day report), which is the output file, looks like this:
name,day,designation,calls,weekly_avg_calls,closed,commision($)
alex,29-05-2019,rep,68,67,8,20
stacy,29-05-2019,sme,79,81,18,63
madhu,29-05-2019,rep,74,77,16,56
gabrielle,29-05-2019,rep,59,61,6,15
I require to copy the required values from the columns(calls,closed,commision($)) of the last line, for end-of-today's report, and then populate it to the Main File(template that already has some columns filled like the {name,day,designation....}).
And so, how can I write a for or a while program, for all the csv files in the "Employee_performance_DB" list.
Employee_performance_DB = ['alex.csv', 'stacy.csv', 'poduzav.csv', 'ankit.csv' .... .... .... 'gabrielle.csv']
for employee_db in Employee_performance_DB:
read_object = pd.read_csv(employee_db)
read_object2 = read_object.tail(1)
read_object2.to_csv("Main_Report.csv", header=False, index=False, columns=["calls", "closed", "commision($)"], mode='a')
How to copy values of {calls,closed,commision($)} from the 'Employee_performance_DB' list of files to the exact column in the 'Main_Report.csv' for those exact empolyees?
Well, as I had no answers for this, it took a while for me to find a solution.
The code below fixed my issue...
# Created a list of all the files in "employees_list"
employees_list = ['alex.csv', ......, 'stacy.csv']
for employees in employees_list:
read_object = pd.read_csv(employees)
read_object2 = read_object.tail(1)
read_object2.to_csv("Employee_performance_DB.csv", index=False, mode='a', header=False)
I am triyng to pull some data from a stock market and saving them in different excel files. Every stock trade process has different timeframes like 1m, 3m, 5m, 15m and so on..
I want to create an excel file for each stock and different sheets for each time frames.
My code creates excel file for a stock (symbol) and adds sheets into it (1m,3m,5m...) and saves the file and then pulls the data from stock market api and saves into correct sheet. Such as ETH/BTC, create the file and sheets and pull "1m" data and save it into "1m" sheet.
Code creates file and sheets, I tested it.
The problem is after dataframe is written into excel file it deletes all other sheets. I tried to pull all data for each symbol. But when I opened the excel file only last time frame (1w) has been written and all other sheets are deleted. So please help.
I checked other problems but didn't find the same problem. At last part I am not trying to add a new sheet I am trying to save df to existed sheet.
#get_bars function pulls the data
def get_bars(symbol, interval):
.
.
.
return df
...
timeseries=['1m','3m','5m','15m','30m','1h','2h','4h','6h','12h','1d','1w']
from pandas import ExcelWriter
from openpyxl import load_workbook
for symbol in symbols:
file = ('C:/Users/mi/Desktop/Kripto/' + symbol + '.xlsx')
workbook = xlsxwriter.Workbook(file)
workbook.close()
wb = load_workbook(file)
for x in range(len(timeseries)):
ws = wb.create_sheet(timeseries[x])
print(wb.sheetnames)
wb.save(file)
workbook.close()
xrpusdt = get_bars(symbol,interval='1m')
writer = pd.ExcelWriter(file, engine='xlsxwriter')
xrpusdt.to_excel(writer, sheet_name='1m')
writer.save()
I think instead of defining the ExcelWriter as a variable, you need to use it in a With statement and use the append mode since you have already created an excel file using xlsxwriter like below
for x in range(len(timeseries)):
xrpusdt = get_bars(symbol,interval=timeseries[x])
with pd.ExcelWriter(file,engine='openpyxl', mode='a') as writer:
xrpusdt.to_excel(writer, sheet_name=timeseries[x])
And in your code above, you're using a static interval as "1m" in the xrpusdt variable which is changed into variable in this code.
Resources:
Pandas ExcelWriter: here you can see the use-case of append mode https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.ExcelWriter.html#pandas.ExcelWriter
Pandas df.to_excel: here you can see how to write to more than one sheet
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_excel.html
I am a newbie in matlab and get stuck on this matter. I try to make one new file from multiple excel file by using matlab code. It managed to produce the new file. However the file is in a mess and I really do not have any idea how to do it. Here is the code:
% Merge multiple XLS files into one XLS file
[filenames, folder] = uigetfile('*.xls','Select the data file','MultiSelect','on'); % gets directory from any folder
% Create output file name in the same folder.
outputFileName = fullfile(folder, 'rainfall.xls');
fidOutput = fopen(outputFileName, 'wt'); % open output file to write
for k = 1 : length(filenames)
% Get this file name.
thisFileName = fullfile(folder, filenames{k});
% Open input file:
fidInput = fopen(thisFileName);
% Read text from it
thisText = fread(fidInput, '*char');
% Copy to output file:
fwrite(fidOutput, thisText);
fclose(fidInput); % close the input file
end
fclose(fidOutput);
I attah the picture showing how mess the resulted data is. Could you please help me? Thank you very much.
[files,folder] = uigetfile('*.xls','Select Files','MultiSelect','on');
output = fullfile(folder,'rainfall.xls');
c = cell(0,5);
for i = 1:numel(files)
c_curr = table2cell(readtable(fullfile(folder,files{i}),'ReadVariableNames',false));
c = [c; c_curr];
end
tab = cell2table(c,'VariableNames',{'MyVar1' 'MyVar2' 'MyVar3' 'MyVar4' 'MyVar5'});
writetable(tab,output);
Of course, every file must contain the same number of columns and every column must have the same underlying data type across all the files.
Use xlsread (or readtable, if you have a recent version of Matlab) instead of fread. Hope this helps.
I tried loading the baseball statistics from this link. When I read it from the file using
data <- read.csv("MLB2011.csv")
it seems to be reading all fields as factor values. I tried dropping those factor values by doing:
read.csv("MLB2011.xls", as.is= FALSE)
.. but it looks like the values are still being read as factors. What can I do to have them loaded as simple character values and not factors?
You aren't reading a csv file, it is an excel spreadsheet (.xls format). It contains two worksheets bat2011 and pitch2011
You could use the XLConnect library to read this
library(XLConnect)
# load the work book (connect to the file)
wb <- loadWorkbook("MLB2011.xls")
# read in the data from the bat2011 sheet
bat2011 <- readWorksheet(wb, sheet = 'bat2011')
readWorksheet has an argument colType which you could use to specify the column types.
Edit
If you have already saved the sheets as csv files then
as.is = TRUE or stringsAsFactors = FALSE will be the correct argument values