Matlab error: Invalid call to strsplit - string

I am trying to divide a set of three numbers from a string. here is my code:
tline =fgetl(fid);
in_points=fgetl(fid);
B = strrep(in_points,' ',' ')
C = char(strsplit(B));
points = reshape(str2num(C), 3, [])'
My input file looks like this:
output 1 for p=0.01
8 8 1 4 15 1 5 17 1 17 17 1 13 1 2 10 3 2 16 4 2 18 6 2 6 3 3 9 3 3 9 7 3 2 13 3 7 18 3 19 20 3 12 4 4 1 6 4 12 10 5 9 12 5 8 19 5 18 4 6 13 9 6 12 16 6 6 8 7 17 12 7 18 6 8 7 15 8 8 8 9 3 19 9 17 19 9 20 2 10 20 4 10 3 8 10 11 7 11 10 12 11 4 14 11 19 3 12 4 11 12 6 11 12 11 13 12 19 14 12 13 15 12 14 18 12 3 19 12 1 3 13 9 9 13 20 10 13 5 13 13 4 17 13 15 16 14 11 18 14 20 3 15 6 13 15 7 16 15 12 17 15 9 1 16 11 1 16 9 5 16 11 12 16 11 16 16 20 19 16 19 13 17 16 16 17 5 19 17 19 1 18 20 10 18 13 16 18 6 1 19 16 4 19 20 7 19 13 11 19 2 19 19 1 6 20 10 14 20 16 15 20 18 16 20 7 20 20
I want to separate the numbers as
8 8 1
4 15 1
5 17 1
and so on. When I run this code in octave it shows error. Any help will be appreciated.

You're code seems generally fine to me, although as mentioned in Hoki's comment there's probably a cleaner way to do it.
The only error is that you never actually read in the first line of the data. The first fgetl command reads in your title line. The second reads the blank line between the title and your data, instead of what you probably wanted to be in_points
If you add in another fgetl between the tline and in_points lines, it worked for me.
>> points
points =
8 8 1
4 15 1
5 17 1
17 17 1
13 1 2
10 3 2
16 4 2
18 6 2
6 3 3
9 3 3
9 7 3
...
As Hoki mentioned, the B = strrep(in_points,' ',' ') line does nothing but replace a space with a space. not sure what you were trying to do there.

Related

How to insert a pandas series as a new column in DataFrame, matching with the indexes of df with series of different length

I have a dataframe with multiple columns and 700+ rows and a series of 27 rows. I want to create a new column i.e. series in dataframe as per matching indexes with predefined column in df
data frame I have and need to add series which contains the same indexes of "Reason for absence"
ID Reason for absence Month of absence Day of the week Seasons
0 11 26 7 3 1
1 36 0 7 3 1
2 3 23 7 4 1
3 7 7 7 5 1
4 11 23 7 5 1
5 3 23 7 6 1
6 10 22 7 6 1
7 20 23 7 6 1
8 14 19 7 2 1
9 1 22 7 2 1
10 20 1 7 2 1
11 20 1 7 3 1
12 20 11 7 4 1
13 3 11 7 4 1
14 3 23 7 4 1
15 24 14 7 6 1
16 3 23 7 6 1
17 3 21 7 2 1
18 6 11 7 5 1
19 33 23 8 4 1
20 18 10 8 4 1
21 3 11 8 2 1
22 10 13 8 2 1
23 20 28 8 6 1
24 11 18 8 2 1
25 10 25 8 2 1
26 11 23 8 3 1
27 30 28 8 4 1
28 11 18 8 4 1
29 3 23 8 6 1
30 3 18 8 2 1
31 2 18 8 5 1
32 1 23 8 5 1
33 2 18 8 2 1
34 3 23 8 2 1
35 10 23 8 2 1
36 11 24 8 3 1
37 19 11 8 5 1
38 2 28 8 6 1
39 20 23 8 6 1
40 27 23 9 3 1
41 34 23 9 2 1
42 3 23 9 3 1
43 5 19 9 3 1
44 14 23 9 4 1
this is series table s_conditions
0 Not absent
1 Infectious and parasitic diseases
2 Neoplasms
3 Diseases of the blood
4 Endocrine, nutritional and metabolic diseases
5 Mental and behavioural disorders
6 Diseases of the nervous system
7 Diseases of the eye
8 Diseases of the ear
9 Diseases of the circulatory system
10 Diseases of the respiratory system
11 Diseases of the digestive system
12 Diseases of the skin
13 Diseases of the musculoskeletal system
14 Diseases of the genitourinary system
15 Pregnancy and childbirth
16 Conditions from perinatal period
17 Congenital malformations
18 Symptoms not elsewhere classified
19 Injury
20 External causes
21 Factors influencing health status
22 Patient follow-up
23 Medical consultation
24 Blood donation
25 Laboratory examination
26 Unjustified absence
27 Physiotherapy
28 Dental consultation
dtype: object
I tried this
df1.insert(loc=0, column="Reason_for_absence", value=s_conditons)
out- this is wrong because i need the reason_for_absence colum according to the index of reason for absence and s_conditions
Reason_for_absence ID Reason for absence \
0 Not absent 11 26
1 Infectious and parasitic diseases 36 0
2 Neoplasms 3 23
3 Diseases of the blood 7 7
4 Endocrine, nutritional and metabolic diseases 11 23
5 Mental and behavioural disorders 3 23
6 Diseases of the nervous system 10 22
7 Diseases of the eye 20 23
8 Diseases of the ear 14 19
9 Diseases of the circulatory system 1 22
10 Diseases of the respiratory system 20 1
11 Diseases of the digestive system 20 1
12 Diseases of the skin 20 11
13 Diseases of the musculoskeletal system 3 11
14 Diseases of the genitourinary system 3 23
15 Pregnancy and childbirth 24 14
16 Conditions from perinatal period 3 23
17 Congenital malformations 3 21
18 Symptoms not elsewhere classified 6 11
19 Injury 33 23
20 External causes 18 10
21 Factors influencing health status 3 11
22 Patient follow-up 10 13
23 Medical consultation 20 28
24 Blood donation 11 18
25 Laboratory examination 10 25
26 Unjustified absence 11 23
27 Physiotherapy 30 28
28 Dental consultation 11 18
29 NaN 3 23
30 NaN 3 18
31 NaN 2 18
32 NaN 1 23
i am getting output upto 28 rows and NaN values after that. Instead, I need correct order of series according to indexes for all the rows
While this question is a bit confusing, it seems the desire is to match the series index with the dataframe "Reason for Absence" column. If this is correct, below is a small example of how to accomplish. Keep in mind, the resulting dataframe will be sorted based on the 'Reason for Absence Numerical' column. If my understanding is incorrect, please clarify this question so we can better assist you.
d = {'ID': [11,36,3], 'Reason for Absence Numerical': [3,2,1], 'Day of the Week': [4,2,6]}
dataframe = pd.DataFrame(data=d)
s = {0: 'Not absent', 1:'Neoplasms', 2:'Injury', 3:'Diseases of the eye'}
disease_series = pd.Series(data=s)
def add_series_to_df(df, series, index_val):
df_filtered = df[df['Reason for Absence Numerical'] == index_val].copy()
series_filtered = series[series.index == index_val]
if not df_filtered.empty:
df_filtered['Reason for Absence Text'] = series_filtered.item()
return df_filtered
x = [add_series_to_df(dataframe, disease_series, index_val) for index_val in range(len(disease_series.index))]
new_df = pd.concat(x)
print(new_df)

A for loop creates more integers than expect in bash

I have two dirs base and to_move. There are 10 files in base, which are named
0 1 2 3 4 5 6 7 8 9, and 3 files, 0 1 2, in to_move. What I want is to move the 3 files in to_move to base, with their names changed to 10 11 12.
Inside the dir to_move, I run the command
tmp=$(ls);for item in ${tmp[#]};do dst=$((item+10));echo $dst $item;done
what I got is
10 0
11 1
12 2
11 1
20 10
21 11
22 12
23 13
24 14
25 15
26 16
27 17
28 18
29 19
12 2
30 20
31 21
32 22
33 23
34 24
35 25
36 26
37 27
38 28
13 3
14 4
15 5
16 6
17 7
18 8
19 9
This makes no sense to me, it seems $(($item+10)) has some weird effects on $item.
Why this happens? And how can I modify the command to get this output?
10 0
11 1
12 2

How to work on "age bins" in Pandas Dataframe which are saved as string?

I downloaded a dataset in .csv format from kaggle which is about lego. There's a "Ages" column like this:
df['Ages'].unique()
array(['6-12', '12+', '7-12', '10+', '5-12', '8-12', '4-7', '4-99', '4+',
'9-12', '16+', '14+', '9-14', '7-14', '8-14', '6+', '2-5', '1½-3',
'1½-5', '9+', '5-8', '10-21', '8+', '6-14', '5+', '10-16', '10-14',
'11-16', '12-16', '9-16', '7+'], dtype=object)
These categories are the suggested ages for using and playing with the legos.
I'm intended to do some statistical analysis with these age bins. For example, I want to check the mean of these suggested ages.
However, since the type of each of them is string:
type(lego_dataset.loc[0]['Ages'])
str
I don't know how to work on the data.
I've already check How to categorize a range of values in Pandas DataFrame
But imagine there are 100 unique bins. It's not reasonable to prepare a list of 100 labels for each category. There should be a better way.
Not entirely sure what output you are looking for. See if the below code & output helps you.
df['Lage'] = df['Ages'].str.split('[-+]').str[0]
df['Uage'] = df['Ages'].str.split('[-+]').str[-1]
or
df['Lage'] = df['Ages'].str.extract('(\d+)', expand=True) #you don't get the fractions for row 17 & 18
df['Uage'] = df['Ages'].str.split('[-+]').str[-1]
Input
Ages
0 6-12
1 12+
2 7-12
3 10+
4 5-12
5 8-12
6 4-7
7 4-99
8 4+
9 9-12
10 16+
11 14+
12 9-14
13 7-14
14 8-14
15 6+
16 2-5
17 1½-3
18 1½-5
19 9+
20 5-8
21 10-21
22 8+
23 6-14
24 5+
25 10-16
26 10-14
27 11-16
28 12-16
29 9-16
30 7+
Output1
Ages Lage Uage
0 6-12 6 12
1 12+ 12
2 7-12 7 12
3 10+ 10
4 5-12 5 12
5 8-12 8 12
6 4-7 4 7
7 4-99 4 99
8 4+ 4
9 9-12 9 12
10 16+ 16
11 14+ 14
12 9-14 9 14
13 7-14 7 14
14 8-14 8 14
15 6+ 6
16 2-5 2 5
17 1½-3 1½ 3
18 1½-5 1½ 5
19 9+ 9
20 5-8 5 8
21 10-21 10 21
22 8+ 8
23 6-14 6 14
24 5+ 5
25 10-16 10 16
26 10-14 10 14
27 11-16 11 16
28 12-16 12 16
29 9-16 9 16
30 7+ 7
Output2
Ages Lage Uage
0 6-12 6 12
1 12+ 12
2 7-12 7 12
3 10+ 10
4 5-12 5 12
5 8-12 8 12
6 4-7 4 7
7 4-99 4 99
8 4+ 4
9 9-12 9 12
10 16+ 16
11 14+ 14
12 9-14 9 14
13 7-14 7 14
14 8-14 8 14
15 6+ 6
16 2-5 2 5
17 1½-3 1 3
18 1½-5 1 5
19 9+ 9
20 5-8 5 8
21 10-21 10 21
22 8+ 8
23 6-14 6 14
24 5+ 5
25 10-16 10 16
26 10-14 10 14
27 11-16 11 16
28 12-16 12 16
29 9-16 9 16
30 7+ 7

Defining the range of a regression in which the X and Y will change

I need to create a simple regression with the Data Analysis Toolpack. The thing is, the range for Y and X input is always different. To illustrate what I'm trying to say, here's an example of the table I need to work on:
A B C D E F G H I J K L
1 Y T T1 T2 T3 T4 T5 T6 T7 T8 T9 T10
2 19 1
3 13 2 19
4 14 3 13 19
5 16 4 14 13 19
6 17 5 16 14 13 19
7 16 6 17 16 14 13 19
8 20 7 16 17 16 14 13 19
9 10 8 20 16 17 16 14 13 19
10 20 9 10 20 16 17 16 14 13 19
11 11 10 20 10 20 16 17 16 14 13 19
12 11 11 11 20 10 20 16 17 16 14 13 19
13 14 12 11 11 20 10 20 16 17 16 14 13
14 15 13 14 11 11 20 10 20 16 17 16 14
15 17 14 15 14 11 11 20 10 20 16 17 16
16 10 15 17 15 14 11 11 20 10 20 16 17
17 4 16 10 17 15 14 11 11 20 10 20 16
18 15 17 4 10 17 15 14 11 11 20 10 20
19 6 18 15 4 10 17 15 14 11 11 20 10
20 10 19 6 15 4 10 17 15 14 11 11 20
21 16 20 10 6 15 4 10 17 15 14 11 11
22 16 10 6 15 4 10 17 15 14 11
23 16 10 6 15 4 10 17 15 14
24 16 10 6 15 4 10 17 15
25 16 10 6 15 4 10 17
26 16 10 6 15 4 10
27 16 10 6 15 4
28 16 10 6 15
29 16 10 6
30 16 10
31 16
In this example, The Y input would be range A12:A21, that's because the first entry in the last column of the table (the "19" in cell L12) is in row 12 AND The last entry in the first column of the table (the "16" in cell A21) is in row 21; furthermore, The X input would be region B12:L21 for the same reasons.
After doing the first regression, I need to delete two columns out of the table and afterwards do ANOTHER regression. So if, for example I need to delete Columns J and L, the table would look like this:
A B C D E F G H I J
1 Y T T1 T2 T3 T4 T5 T6 T7 T9
2 19 1
3 13 2 19
4 14 3 13 19
5 16 4 14 13 19
6 17 5 16 14 13 19
7 16 6 17 16 14 13 19
8 20 7 16 17 16 14 13 19
9 10 8 20 16 17 16 14 13 19
10 20 9 10 20 16 17 16 14 13
11 11 10 20 10 20 16 17 16 14 19
12 11 11 11 20 10 20 16 17 16 13
13 14 12 11 11 20 10 20 16 17 14
14 15 13 14 11 11 20 10 20 16 16
15 17 14 15 14 11 11 20 10 20 17
16 10 15 17 15 14 11 11 20 10 16
17 4 16 10 17 15 14 11 11 20 20
18 15 17 4 10 17 15 14 11 11 10
19 6 18 15 4 10 17 15 14 11 20
20 10 19 6 15 4 10 17 15 14 11
21 16 20 10 6 15 4 10 17 15 11
22 16 10 6 15 4 10 17 14
23 16 10 6 15 4 10 15
24 16 10 6 15 4 17
25 16 10 6 15 10
26 16 10 6 4
27 16 10 15
28 16 6
29 10
30 16
And now the regression would be with inputs Y (A11:A21) because the first entry in the last column of the table ("19" in cell J11) is in row 11 AND The last entry in the first column of the table ("16" in cell A21) is in row 21. Likewise the X input would be (B11:J21) for the same reasons.
I have tried in a hundred different ways, but no luck. This is the closest I've been to creating what I need, but I'm still lost since it won't execute the regression:
Sub Prueba1()
Range("A1").Select
Selection.End(xlToRight).Select
Selection.End(xlDown).Select
Selection.End(xlToLeft).Select
Application.Run "ATPVBAEN.XLAM!Regress", Range(Selection, Selection.End(xlDown)).Select, _
Range(Selection.Offset(, 1), Selection.End(xlToRight)).Select, False, False, , Range("S1") _
, False, False, False, False, , False
End Sub
This User Defined Function (aka UDF) will return the range into your Application.Run "ATPVBAEN.XLAM!Regress" as a parameter.
Function regress_range()
Dim strAddr As String, c As Long
With Worksheets("Sheet4") '<~~set this worksheet name!
With .Cells(1, 1).CurrentRegion
Set regress_range = .Range(.Cells(.Cells(1, .Columns.Count).End(xlDown).Row, 1), _
.Cells(Application.Match(1E+99, .Columns(1)), .Columns.Count))
End With
End With
End Function
You need to make sure that it is properly referencing the correct worksheet in the third line.
This would become part of the run command like,
Application.Run "ATPVBAEN.XLAM!Regress", regress_range(), False, False, , Range("S1") _
, False, False, False, False, , False
I'm still concerned how Range("S1") may change (i.e. shift right) if columns are deleted from the regression range. Additionally, it has no explicitly referenced parent worksheet.
Output starting at your original data block:
$A$12:$L$21
$A$11:$J$21

How to find the numver of duplicate lines, each line contains a few numbers seperated by spaces

Suppose i have a file like this...
4 2 8 2 12 3 18 2 22 2 26 2 28 3 30 2
4 3 10 2 14 2 18 2 20 3 22 2 28 2 32 2
2 3 10 3 12 2 16 2 18 3 20 2 24 2 26 3
1 3 3 3 17 3 19 3 26 2 28 2 30 2 32 2
4 2 8 2 12 3 18 2 22 2 26 2 28 3 30 2
the first and the last line are the same in the input...
I want the output to be like ...
4 2 8 2 12 3 18 2 22 2 26 2 28 3 30 2 2
4 3 10 2 14 2 18 2 20 3 22 2 28 2 32 2 1
2 3 10 3 12 2 16 2 18 3 20 2 24 2 26 3 1
1 3 3 3 17 3 19 3 26 2 28 2 30 2 32 2 1
The extra last coloum in the output simply specifies the extra number of lines.....
how can i do this in bash...
i know the sort command but it only works with one number per line....
Coming from sehe's suggestion, what about this?
sort your_file | uniq -c | awk '{for(i=2;i<=NF;i++) printf $i"\t"; printf $1"\n"}'
Output:
1 3 3 3 17 3 19 3 26 2 28 2 30 2 32 2 1
2 3 10 3 12 2 16 2 18 3 20 2 24 2 26 3 1
4 2 8 2 12 3 18 2 22 2 26 2 28 3 30 2 2
4 3 10 2 14 2 18 2 20 3 22 2 28 2 32 2 1

Resources