read complicated excel or csv file into matlab - excel

I have an excel file, which is a mix of text file and numerical values.
For instance, the file look like this,
25 file1
26 file2
Here the 25 is an numerical value in the first cell (row 1, column1). "file1" represents the content in the second cell(row1, column2). It can be short text file composed of multiple paragraphs.
I want to load this excel file into matlab, and store it into a 2*2 matrix. Each matrix entry corresponds to a matrix cell.
I tried xlsread, but it did not work. I also tried textscan, but it seems to be able to handle the scenario where a cell has a string only. Here, the contents of some cells are text files itself.

If you are reading an Excel file using XLSREAD, you can use the third output argument to retrieve the both the textual and numeric data (unprocessed).
Example:
>> [~,~,raw] = xlsread('Book1.xls')
raw =
[25] 'hello world.'
[26] [1x38 char]
>> raw{2,2}
ans =
this is an example
of multi-line
text
Note that XLSREAD is limited to the capabilities of MS Excel to open/read files, so some especially large files (in my experience 1 million+ rows) will get only partially read.

Related

Data missing when importing a text file into excel

i'm trying to import a text file of csv data into excel. The data contains mostly integers but there's one column with strings. I'm using the Data tab of excel professional plus 2019. However, when I select comma as the delimiter i loose 5 of the 16 columns, starting with the one containing strings. The data looks like the below. the date and the 7 numbers are in their own columns (just white space separated) . can anyone help or explain many thanks
2143, Wed, 6,Jul,2016, 38,20,03,39,01,24,04, 2198488, 0, Lancelot , 6
Before
after
full data is on https://github.com/CH220/textfileforexcel
Your problem stems from the very first line of data in your text file:
40,03,52,02,07,20,14, 13137760, 1, Lancelot , 7
As you can see, there are only eleven "segments". Hence, when you try to use the import dialog to separate by comma, there will only be 11 columns even though subsequent rows have 16 columns.
Possible solutions:
Correct the text file so the first line has the desired number of segments
Change the Import Dialog, as you did, to comma, then
Transform
Edit the second line of the generated M-code to change from Columns=11 to Columns=16. You do this in the Advanced Editor
Source = Csv.Document(File.Contents("C:\Users\ron\Desktop\new 2.txt"),[Delimiter=",", Columns=16, Encoding=1252]),
Change the Fixed Width "argument" from 0,23 => 0
Transform
Split Column by delimiter (using the comma) in Power Query.
To me, the "best" way would be to correct the text file.

How to make excel treat text as string in Clojure using data.csv?

I am using data.csv to write export data to a csv while however i have some alphanumeric fields which are ids but since they are all numbers excel is treating them is doubles and showing them in exponential form.
Is there a way that we can tell excel to treat is as it is.
Excel displays long numbers in csv files in an abbreviated form with exponents.
Unfortunately there is no way to disable that functionality from within the generated csv.
Also sending it in as text shows the same abbreviated format. Your choices are
1) Assuming the id number has fewer than 16 digits you can go into excel and change the format.
2) Alternatively you can prepend an apostrophe or text character to your id's before you generate the csv. For example
(ns sample.core
(:use [clojure.data.csv]
[clojure.java.io]))
(defn gen-csv [filename]
(with-open [out-file (writer filename)]
(write-csv out-file
[["'123000000" "'45612333"]
["'789909990" "'90099999124"]])))

Reading mix between numeric and non-numeric data from excel into Matlab

I have a matrix where the first column contains dates and the first row contains maturities which are alpha/numeric (e.g. 16year).
The rest of the cells contain the rates for each day, which are double precision numbers.
Now I believe xlsread() can only handle numeric data so I think I will need something else or a combination of functions?
I would like to be able to read the table from excel into MATLAB as one array or perhaps a struct() so that I can keep all the data together.
The other problem is that some of the rates are given as '#N/A'. I want the cells where these values are stored to be kept but would like to change the value to blank=" ".
What is the best way to do this? Can it be done as part of the input process?
Well, from looking at matlab reference for xlsread you can use the format
[num,txt,raw] = xlsread(FILENAME)
and then you will have in num a matrix of your data, in txt the unreadable data, i.e. your text headers, and in raw you will have all of your data unprocessed. (including the text headers).
So I guess you could use the raw array, or a combination of the num and txt.
For your other problem, if your rates are 'pulled' from some other source, you can use
=IFERROR(RATE DATA,"")
and then there will be a blank instead of the error code #N\A.
Another solution (only for Windows) would be to use xlsread() format which allows running a function on your imported data,
[num,txt,raw,custom] = xlsread(filename,sheet,xlRange,'',functionHandler)
and let the function replace the NaN values with blank spots. (and you will have your output in the custom array)

Truncating characters when importing with SAS

I have an Excel spreadsheet with company data and descriptions. Some of the cells basically contain mini-essays in them, pages and pages of straight text contained in a single cell. SAS has been giving me problems when I'm importing the file because it truncates some of the longer cells and the text gets cut off mid-sentence. Any ideas on how to avoid this? I've tried saving the file to a tab-delimited text file, but no luck.
Thanks!
Exporting to tab-delimited or csv may be the way to go, as you said. Be sure to have strings enclosed in quotes also. But do you have the length specified for the variable containing the long cells? According to SAS the maximum length is 32,767 characters, so perhaps try as large a number as it takes -- hopefully less than that.
Also the lrecl (max length of each line of the file) should be specified with a max of 32767.
data test;
length company_name $20 description1 description2 $10000;
infile my_tab_dlm_file lrecl = 50000 dsd delimiter = '09'x;
input company_name
description1
description2
;
run;
If you have a license for SAS/ACCESS (which this link explains how to check). You can use a libname to access the Excel spreadsheet (this link talks about Excel access) and this is a great paper which details how to get at the Excel data just like a SAS data set.
(but #Neil Neyman's answer sounds good too)

summing up excel files in matlab

Is there a easy good way to sum up various excel files in matlab?
what i really want is similar to
dos command
type file*.xls> sumfile.xls
I have from 10-100 excel files with similar file name formats excet the date
XXXXX_2010_03_03.xls, XXXXX_2010_03_03.xls and so on.....
Is there a command to copy the files one after other. All files are of diff length so i cannot know the position of the rows after each file. I would like to have them copied in same sheet of excel.
Thanks
Get file names
names=dir('XXXXX-*.xls');
names={names.name};
output='out.xls';
First file. This will overwrite the output each time you run this program - it's up to you if this is the behavior you want.
copyfile(names{1},output);
Cycle through the files
for i=2:length(names)
num_in = xlsread(names{i}); % read the data
num_out = xlsread(output);
range=['A' num2str(size(num_out,1)+1)]; % next free line
xlswrite(output, num_in, 1, range); %always write to the 1st sheet
end
This should work if (1) you only have numerical data and (2) you want to concatenate ("sum", as you put it) the files top to bottom.
If (1) is wrong, please read xlsread's help -- look for txt and raw outputs.
Use xlswrite(filename, M, range) to write your files one after the other. Read the Excel file into M with xlsread.
xlswrite(filename, M, range) writes
matrix M to a rectangular region
specified by range in the first
worksheet of the file filename.

Resources