writetable replace NaN with blanks in Matlab - excel

Given a Matlab table that contains many NaN, how can I write this table as an excel or csv files where the NaN are replaced by blanks?
I use the following function:
T = table(NaN(5,2),'VariableNames',{'A','C'})
writetable(T, filename)
I do not want to replace it with zeros. I want that the output file:
has blanks for NaN and
that the variable names are included in the output.

You just need xlswrite for that. It replaces NaNs with blanks itself. Use table2cell or the combination of table2array and num2cell to convert your table to a cell array first. Use the VariableNames property of the table to retrieve the variable names and pad them with the cell array.
data= [T.Properties.VariableNames; table2cell(T)];
%or data= [T.Properties.VariableNames; num2cell(table2array(T))];
xlswrite('output',data);
Sample run for:
T = table([1;2;3],[NaN; 410; 6],[31; NaN; 27],'VariableNames',{'One' 'Two' 'Three'})
T =
3×3 table
One Two Three
___ ___ _____
1 NaN 31
2 410 NaN
3 6 27
yields:
Although the above solution is simpler in my opinion but if you really want to use writetable then:
tmp = table2cell(T); %Converting the table to a cell array
tmp(isnan(T.Variables)) = {[]}; %Replacing the NaN entries with []
T = array2table(tmp,'VariableNames',T.Properties.VariableNames); %Converting back to table
writetable(T,'output.csv'); %Writing to a csv file

I honestly think the most straight-forward way to output the data in the format you describe is to use xlswrite as Sardar did in his answer. However, if you really want to use writetable, the only option I can think of is to encapsulate every value in the table in a cell array and replace the nan entries with empty cells. Starting with this sample table T with random data and nan values:
T = table(rand(5,1), [nan; rand(3,1); nan], 'VariableNames', {'A', 'C'});
T =
A C
_________________ _________________
0.337719409821377 NaN
0.900053846417662 0.389738836961253
0.369246781120215 0.241691285913833
0.111202755293787 0.403912145588115
0.780252068321138 NaN
Here's a general way to do the conversion:
for name = T.Properties.VariableNames % Loop over variable names
temp = num2cell(T.(name{1})); % Convert numeric array to cell array
temp(cellfun(#isnan, temp)) = {[]}; % Set cells with NaN to empty
T.(name{1}) = temp; % Place back into table
end
And here's what the table T ends up looking like:
T =
A C
___________________ ___________________
[0.337719409821377] []
[0.900053846417662] [0.389738836961253]
[0.369246781120215] [0.241691285913833]
[0.111202755293787] [0.403912145588115]
[0.780252068321138] []
And now you can output it to a file with writetable:
writetable(T, 'sample.csv');

Related

How to replace text in column by the value contained in the columns named in this text

In pyspark, I'm trying to replace multiple text values in a column by the value that are present in the columns which names are present in the calc column (formula).
So to be clear, here is an example :
Input:
|param_1|param_2|calc
|-------|-------|--------
|Cell 1 |Cell 2 |param_1-param_2
|Cell 3 |Cell 4 |param_2/param_1
Output needed:
|param_1|param_2|calc
|-------|-------|--------
|Cell 1 |Cell 2 |Cell 1-Cell 2
|Cell 3 |Cell 4 |Cell 4/Cell 3
In the column calc, the default value is a formula. It can be something as much as simple as the ones provided above or it can be something like "2*(param_8-param_4)/param_2-(param_3/param_7)".
What I'm looking for is something to substitute all the param_x by the values in the related columns regarding the names.
I've tried a lot of things but nothing works at all and most of the time when I use replace or regex_replace with a column for the replacement value, the error the column is not iterable occurs.
Moreover, the columns param_1, param_2, ..., param_x are generated dynamically and the calc column values can some of these columns but not necessary all of them.
Could you help me on the subject with a dynamic solution ?
Thank you so much.
Best regards
Update: Turned out I misunderstood the requirement. This would work:
for exp in ["regexp_replace(calc, '"+col+"', "+col+")" for col in df.schema.names]:
df=df.withColumn("calc", F.expr(exp))
Yet Another Update: To Handle Null Values add coalesce:
for exp in ["coalesce(regexp_replace(calc, '"+col+"', "+col+"), calc)" for col in df.schema.names]:
df=df.withColumn("calc", F.expr(exp))
Input/Output:
------- Keeping the below section for a while just for reference -------
You can't directly do that - as you won't be able to use column value directly unless you collect in a python object (which is obviously not recommended).
This would work with the same:
df = spark.createDataFrame([["1","2", "param_1 - param_2"],["3","4", "2*param_1 + param_2"]]).toDF("param_1", "param_2", "calc");
df.show()
df=df.withColumn("row_num", F.row_number().over(Window.orderBy(F.lit("dummy"))))
as_dict = {row.asDict()["row_num"]:row.asDict()["calc"] for row in df.select("row_num", "calc").collect()}
expression = f"""CASE {' '.join([f"WHEN row_num ='{k}' THEN ({v})" for k,v in as_dict.items()])} \
ELSE NULL END""";
df.withColumn("Result", F.expr(expression)).show();
Input/Output:

Display 2 decimal places, and use comma as separator in pandas?

Is there any way to replace the dot in a float with a comma and keep a precision of 2 decimal places?
Example 1 : 105 ---> 105,00
Example 2 : 99.2 ---> 99,20
I used a lambda function df['abc']= df['abc'].apply(lambda x: f"{x:.2f}".replace('.', ',')). But then I have an invalid format in Excel.
I'm updating a specific sheet on excel, so I'm using : wb = load_workbook(filename) ws = wb["FULL"] for row in dataframe_to_rows(df, index=False, header=True): ws.append(row)
Let us try
out = (s//1).astype(int).astype(str)+','+(s%1*100).astype(int).astype(str).str.zfill(2)
0 105,00
1 99,20
dtype: object
Input data
s=pd.Series([105,99.2])
s = pd.Series([105, 99.22]).apply(lambda x: f"{x:.2f}".replace('.', ',')
First .apply takes a function inside and
f string: f"{x:.2f} turns float into 2 decimal point string with '.'.
After that .replace('.', ',') just replaces '.' with ','.
You can change the pd.Series([105, 99.22]) to match it with your dataframe.
I think you're mistaking something in here. In excel you can determine the print format i.e. the format in which numbers are printed (this icon with +/-0).
But it's not a format of cell's value i.e. cell either way is numeric. Now your approach tackles only cell value and not its formatting. In your question you save it as string, so it's read as string from Excel.
Having this said - don't format the value, upgrade your pandas (if you haven't done so already) and try something along these lines: https://stackoverflow.com/a/51072652/11610186
To elaborate, try replacing your for loop with:
i = 1
for row in dataframe_to_rows(df, index=False, header=True):
ws.append(row)
# replace D with letter referring to number of column you want to see formatted:
ws[f'D{i}'].number_format = '#,##0.00'
i += 1
well i found an other way to specify the float format directly in Excel using this code :
for col_cell in ws['S':'CP'] :
for i in col_cell :
i.number_format = '0.00'

Can't drop na with pandas read excel file in Python

I try to remove all NaN rows from a dataframe which I get by pd.read_excel("test.xlsx", sheet_name = "Sheet1"), I have tried with df = df.dropna(how='all') and df.dropna(how='all', inplace=True), both cannot remove the last empty rows which I printed as follows: df.tail(1).
a b c
3463 NaN NaN
I noticed the value in column c is not null but empty. Someone could help to deal with this issue? Thank you.
Maybe you want replace empty values to missing before:
df = df.replace(r'^\s+$', np.nan, regex=True).dropna(how='all')
Regex ^\s+$ means:
^ is start of string
\s+ is one or more whitespaces
$ means end of string
Here NaN is also value and empty will also be treated as a part of row.
In case of NaN, you must drop or replace with something:
dropna()
If you use this function then whenever python finds NaN in a row, it will return True and will remove whole row, doesn't matter if any value is there or not besides NaN.
fillna() to fill some values instead of NaN
In your case :
df['C'].fillna(values="Any value")
Note: It is important to specify columns in which you want to fill values otherwise it will update whole dataframe respective to NaN
Now if there is empty row then try this :
df[df['C']==" "]="Anyvalue"
I have not tried this but my assumption on above is:
Lets break down:
a. df['C']==""
This will return boolean values
b. df[df['C']==""]="Anyvalue"
wherever python finds True, value "Anyvalue" will get applied.

Removing 'NaN' strings and [] cells from cell array in Matlab

I have a cell array, given as
raw = {100 3.2 38 1;
100 3.7 38 1;
100 'NaN' 'NaN' 1;
100 3.8 38 [];
'NaN' 'NaN' 'NaN' 'NaN';
'NaN' 'NaN' 'NaN' [];
100 3.8 38 1};
How can I remove the rows which have at least one 'NaN' string and empty cell []? Thus, in this case, I want to remove 3rd, 4th, 5th and 6th row from the above-mentioned cell array. Thanks in advance!
In your cellarray the values NaN are defined as string and not as the "special" value NaN
In this case, you can use the functions isempty and isfloat to identify which elements of the cellarray are either empty or of type float:
% Remove rows with empty cells
idx=any(cell2mat(cellfun(#isempty,raw,'UniformOutput',false)),2)
raw(idx,:)=[]
% Remove rows with 'NaN'
idx=all(cell2mat(cellfun(#isfloat,raw,'UniformOutput',false)),2)
raw(~idx,:)=[]
In the first step you look for the empty cells using the function isempty, since the input is a cellarray you have to use cellfun to apply the functino to all the elements of the cell array.
isempty returns a cellarray of 0 and 1 where 1 identifies an empty cell, so, after having converted it into an array (with the functino cell2mat) you can identify the indices of the roww with an empty cell using the function any.
IN the second step, with a similar approach, you can identify the rows containing floating values with the function `isfloat.
The same approach can be used in case the NaN in your cellarray are defined as "values" and not as strings:
idx=any(cell2mat(cellfun(#isempty,raw,'UniformOutput',false)),2)
raw(idx,:)=[]
idx=any(cell2mat(cellfun(#isnan,raw,'UniformOutput',false)),2)
raw(idx,:)=[]
To find which row has 'NaN's run:
idxNan = any(cellfun(#(x) isequal(x,'NaN'),raw),2);
Similarly, to find which rows have empty cells run:
idxEmpty = any(cellfun(#(x) isempty(x),raw),2);
Then you can ommit rows you don't want using 'or'
raw(idxNan | idxEmpty,:) = [];
replace | with & if that what you meant

Remove ''' from beginning and end of the strings in MATLAB

How we can remove ''' from beginning and end of strings in a cell array in MATLAB R2015a? Suppose that we have this cell array:
When we open one of cells we have this:
I want convert whole cell array to double (numbers). Suppose that out cell array is final. Using cellfun(#str2double,final) returns Nan for all cells. str2double(final) returns Nan too.
PS.
10 last elements of final in command prompt has this structure:
ans =
''2310''
''2319''
''2313''
''2318''
''2301''
''2302''
''2303''
''2312''
''2304''
''2309''
You can replace all of the apostrophe characters with nothing, then apply str2double to each cell in your cell array.
Given that your cell is stored in final, do something like this:
final_rep = strrep(final, '''', '');
out = cellfun(#str2double, final_rep);
Basically, use strrep to replace all of the apostrophe characters with nothing, then apply str2double to each cell in your cell array via cellfun.
Given your example above:
final = {'''2310'''
'''2319'''
'''2313'''
'''2318'''
'''2301'''
'''2302'''
'''2303'''
'''2312'''
'''2304'''
'''2309'''};
We now get this:
>> out =
2310
2319
2313
2318
2301
2302
2303
2312
2304
2309
>> class(out)
ans =
double
As you can see, the output of the array is double, as we expect.

Resources