Making .RData file from Excel sheet - excel

How can I save data from an Excel sheet to .RData file in R? I want to use one of the packages in R and to load my dataset as data(dataset) i think i have to save the data as .RData file and then load that into the package. My data currently is in an Excel spreadsheet.
my excel sheets has column names like x, y , time.lag.
I have saved it as .csv
then i use:
x=read.csv('filepath', header=T,)
then i say
data(x)
and it shows dataset 'x' not found

There are also several packages that allow directly reading from XLS and XLSX files. We've even had a question on that topic here and here for example. However you decide to read in the data, saving into an RData can be handled with save, save.image, saveRDS and probably some others I'm not thinking about.

save your Excel data as a .csv file and import it using read.csv() or read.table().
Help on each will explain the options.
For example, you have a file called myFile.xls, save it as myFile.csv.
library(BBMM)
# load an example dataset from BBMM
data(locations)
# from the BBMM help file
BBMM <- brownian.bridge(x=locations$x, y=locations$y, time.lag=locations$time.lag[-1], location.error=20, cell.size=50)
bbmm.summary(BBMM)
# output of summary(BBMM)
Brownian motion variance : 3003.392
Size of grid : 138552 cells
Grid cell size : 50
# subsitute locations for myData for your dataset that you have read form a myFile.csv file
myData <- read.csv(file='myFile.csv', header=TRUE)
head(myData) # will show the first 5 entries in you imported data
# use whatever you need from the BBMM package now ....

Check RODBC package. You can find an example in R Data Import/Export. You can query data from excel sheet as if from a database table.
The benefit of reading Excel sheet with RODBC is that you get dates (if you work with any) in a proper format. With intermediate CSV, you'd need to specify a column type, unless you want it to be a factor or string. Also you can query only a portion of your data if you need so thus making subset() unnecessary.

Related

Convert excel 1 column to multiple csv column

I have question
There are excel data like this
input file
More than 500 person
I wanna convert data to csv
expected csv result
The data age is not 100% the second row, some may be third row. Name can be duplicate data.
I’m really confused. Can i use excel feature to do this or any way like coding?
I upload file : https://ufile.io/rxe1l
Add name, age, add etc as column in excel and export as .csv.
In your Excel workbook, switch to the File tab, and then click Save As. Alternatively, you can press F12 to open the same Save As dialog.
In the Save as type box, choose to save your Excel file as CSV (Comma delimited).
Please simply follow this link conversation excel into csv

Load data from excel to matlab

I'm trying to download data from excel to matlab. I downloaded data from yahoo finance. I want to load them in matlab however it's not successful. Right down here you have my codes and the message matlab is sending to me. Can somebody help me to improve my codes?
load SP100Duan.csv
Error using load
Number of columns on line 18 of ASCII file C:\Users\11202931\Desktop\SP100Duan.csv
must be the same as previous lines.
There are a TON of ways to get data into MATLAB. Generally speaking, mixed text/numbers gives MATLAB problems. You may want to clean up the excel file so there is no text in columns that should be numbers. Some possible methods to load data:
Use readtable function. Eg. mytable = readtable('mycsvfile.csv') All your data is put a table datatype, which I personally find convenient.
You can read data directly from an excel file using xlsread function. From your description, it sounds like your datatype is a .csv file though.
Use csvread function.
You can often copy and paste data in excel directly into a variable in Matlab. eg. (i) type: x = 0, then (ii) double click on x variable in your workspace, then (iii) copy data from excel and paste into your x variable (iv) execute something like save mydata.mat so you can later load it with load mydata.mat
This should be pretty simple . . .
xlsread('C:\Apple.xls', 'Sheet1', 'A1:G10')
Also, please see this link.
http://www.mathworks.com/help/matlab/ref/xlsread.html

Can NetLogo Read an Excel File Format?

I've been using individual lists of data to update variables in my ABM. Unfortunately due to the size of data I am using now, it is becoming time consuming to build the lists and then tables. The data for these tables changes frequently, so it is not just a one time thing.
I am hoping to gain some ideas for a method to create a table that can be read directly from an excel spreadsheet, without going through the time to build the table explicitly by inputing the individual lists? My table includes one list of keys ( a list of over 1000 keys) and nearly a hundred variables corresponding to each key, that must be updated when the key is called. The data is produced from a different model (not an ABM) and produces an excel spreadsheet with Keys (X values) and Values (Y values). Something like:
X1 Y1,1 Y1,2 Y1,3… Y1,100
X2 Y2,1 Y2,2 Y2,3… Y2,100
…..
X1000 Y1000,1 Y1000,2 Y1000,3…. Y1000,100
If anyone has a faster method for getting large amounts data from excel into a NetLogo table, I would be very appreciative.
Two solutions, assuming you do not want to write an extension. You can save the Excel file as CSV and then
write a NetLogo procedure to read your CSV file. To get started, see http://netlogoabm.blogspot.com/2014/01/reading-from-csv-file.html
or
use a scripting language (Python recommended) to read the CSV file and then write out a .nls file with code for creating the table

Excel CSV. file with more than 1,048,576 rows of data

I have been given a CSV file with more than the MAX Excel can handle, and I really need to be able to see all the data. I understand and have tried the method of "splitting" it, but it doesnt work.
Some background: The CSV file is an Excel CSV file, and the person who gave the file has said there are about 2m rows of data.
When I import it into Excel, I get data up to row 1,048,576, then re-import it in a new tab starting at row 1,048,577 in the data, but it only gives me one row, and I know for a fact that there should be more (not only because of the fact that "the person" said there are more than 2 million, but because of the information in the last few sets of rows)
I thought that maybe the reason for this happening is because I have been provided the CSV file as an Excel CSV file, and so all the information past 1,048,576 is lost (?).
DO I need to ask for a file in an SQL database format?
You should try delimit it can open up to 2 billion rows and 2 million columns very quickly has a free 15 day trial too. Does the job for me!
I would suggest to load the .CSV file in MS-Access.
With MS-Excel you can then create a data connection to this source (without actual loading the records in a worksheet) and create a connected pivot table. You then can have virtually unlimited number of lines in your table (depending on processor and memory: I have now 15 mln lines with 3 Gb Memory).
Additional advantage is that you can now create an aggregate view in MS-Access. In this way you can create overviews from hundreds of millions of lines and then view them in MS-Excel (beware of the 2Gb limitation of NTFS files in 32 bits OS).
Excel 2007+ is limited to somewhat over 1 million rows ( 2^20 to be precise), so it will never load your 2M line file. I think that the technique you refer to as splitting is the built-in thing Excel has, but afaik that only works for width problems, not for length problems.
The really easiest way I see right away is to use some file splitting tool - there's tons of 'em and use that to load the resulting partial csv files into multiple worksheets.
ps: "excel csv files" don't exist, there are only files produced by Excel that use one of the formats commonly referred to as csv files...
You can use PowerPivot to work with files of up to 2GB, which will be enough for your needs.
First you want to change the file format from csv to txt. That is simple to do, just edit the file name and change csv to txt. (Windows will give you warning about possibly corrupting the data, but it is fine, just click ok). Then make a copy of the txt file so that now you have two files both with 2 millions rows of data. Then open up the first txt file and delete the second million rows and save the file. Then open the second txt file and delete the first million rows and save the file. Now change the two files back to csv the same way you changed them to txt originally.
I'm surprised no one mentioned Microsoft Query. You can simply request data from the large CSV file as you need it by querying only that which you need. (Querying is setup like how you filter a table in Excel)
Better yet, if one is open to installing the Power Query add-in, it's super simple and quick. Note: Power Query is an add-in for 2010 and 2013 but comes with 2016.
If you have Matlab, you can open large CSV (or TXT) files via its import facility. The tool gives you various import format options including tables, column vectors, numeric matrix, etc. However, with Matlab being an interpreter package, it does take its own time to import such a large file and I was able to import one with more than 2 million rows in about 10 minutes.
The tool is accessible via Matlab's Home tab by clicking on the "Import Data" button. An example image of a large file upload is shown below:
Once imported, the data appears on the right-hand-side Workspace, which can then be double-clicked in an Excel-like format and even be plotted in different formats.
I was able to edit a large 17GB csv file in Sublime Text without issue (line numbering makes it a lot easier to keep track of manual splitting), and then dump it into Excel in chunks smaller than 1,048,576 lines. Simple and quite quick - less faffy than researching into, installing and learning bespoke solutions. Quick and dirty, but it works.
Try PowerPivot from Microsoft. Here you can find a step by step tutorial. It worked for my 4M+ rows!
"DO I need to ask for a file in an SQL database format?" YES!!!
Use a database, is the best option for this problem.
Excel 2010 specifications .
Use MS Access. I have a file of 2,673,404 records. It will not open in notepad++ and excel will not load more than 1,048,576 records. It is tab delimited since I exported the data from a mysql database and I need it in csv format. So I imported it into Access. Change the file extension to .txt so MS Access will take you through the import wizard.
MS Access will link to your file so for the database to stay intact keep the csv file
The best way to handle this (with ease and no additional software) is with Excel - but using Powerpivot (which has MSFT Power Query embedded). Simply create a new Power Pivot data model that attaches to your large csv or text file. You will then be able to import multi-million rows into memory using the embedded X-Velocity (in-memory compression) engine. The Excel sheet limit is not applicable - as the X-Velocity engine puts everything up in RAM in compressed form. I have loaded 15 million rows and filtered at will using this technique. Hope this helps someone... - Jaycee
I found this subject researching.
There is a way to copy all this data to an Excel Datasheet.
(I have this problem before with a 50 million line CSV file)
If there is any format, additional code could be included.
Try this.
Sub ReadCSVFiles()
Dim i, j As Double
Dim UserFileName As String
Dim strTextLine As String
Dim iFile As Integer: iFile = FreeFile
UserFileName = Application.GetOpenFilename
Open UserFileName For Input As #iFile
i = 1
j = 1
Check = False
Do Until EOF(1)
Line Input #1, strTextLine
If i >= 1048576 Then
i = 1
j = j + 1
Else
Sheets(1).Cells(i, j) = strTextLine
i = i + 1
End If
Loop
Close #iFile
End Sub
You can try to download and install TheGun Text Editor. Which can help you to open large csv file easily.
You can check detailed article here https://developingdaily.com/article/how-to/what-is-csv-file-and-how-to-open-a-large-csv-file/82
Split the CSV into two files in Notepad. It's a pain, but you can just edit each of them individually in Excel after that.

Is there such thing as a spreadsheet object/library

I'm trying to generate some reporting from existing reports in CSV format. These CSV files don't contain just data, but report name, report date, multiple data sets- each line doesn't necessarily contain the same number of fields or consistent data per row.
I was curious if there was some spreadsheet type library available, this is how I would imagine it to work.
load some csv file into spreadsheet
report_title = spreadsheet("A1")
report_date = spreadsheet("B2")
sales_data_spreadsheet = spreadsheet("A6:E22)")
sales_total = sales_data_spreadsheet("SUM(E1:E17)")
expenses_data_spreadsheet = spreadsheet("A26:E38")
expenses_total = expenses_data_spreadsheet("SUM(E1:E11")
Microsoft Excel?
You don't have spreadsheets, you have described flat files with mixed formats and some metadata. What in your flat files says that the sales data is in A6:E22? In fact what does A6:E22 mean outside the context of Microsoft Excel and in the context of your data?
There are lots of ways for you to handle this data from parsing it yourself long-hand and supplying code to manage the data formats to loading it into a set of database tables and using SQL to break it into pieces. Which you choose depends on what the data is, where it comes from and what you are going to do with it. If you provide a bit more of that sort of information a choice of approach may be easier to recommend.
I've authored such library in Python & Google Spreadsheets API. An interface of the library is not exactly the same as in your example, but it uses objects for sheets and cells representation.
The library's API is pretty straightforward even if you're new to a Python.
First, you have to upload your CSVs into Google Spreadsheets, and then you can access it:
# Load the module
import gspread
# Login with your Google account
gc = gspread.login('_your_google_account_email_','password')
# Open a spreadsheet and worksheet
wks = gc.open("name of the spreadsheet").sheet1
wks.update_acell('B2', "it's down there somewhere, let me take another look.")
# Select a range
sales_data = [float(c.value) for c in worksheet.range('E1:E17')]
# Sum it up
sales_total = sum(sales_data)
Alternatively you can import your CSV data into Excel file and use this wonderful Python library: xlrd

Resources