A for loop creates more integers than expect in bash - linux

I have two dirs base and to_move. There are 10 files in base, which are named
0 1 2 3 4 5 6 7 8 9, and 3 files, 0 1 2, in to_move. What I want is to move the 3 files in to_move to base, with their names changed to 10 11 12.
Inside the dir to_move, I run the command
tmp=$(ls);for item in ${tmp[#]};do dst=$((item+10));echo $dst $item;done
what I got is
10 0
11 1
12 2
11 1
20 10
21 11
22 12
23 13
24 14
25 15
26 16
27 17
28 18
29 19
12 2
30 20
31 21
32 22
33 23
34 24
35 25
36 26
37 27
38 28
13 3
14 4
15 5
16 6
17 7
18 8
19 9
This makes no sense to me, it seems $(($item+10)) has some weird effects on $item.
Why this happens? And how can I modify the command to get this output?
10 0
11 1
12 2

Related

Recursive search in Spark DataFrame

I have employee table, where employee id and supervisor is present. I want to find the hierarchy for the employee in five levels.
Example: Employee 1 is reported to 2, 2 reported to 4,4 reported to 17, 17 reported to 20. But we not able to find 20 supervisor so we kept the supervisor for 20 is 20 itself.
EmployeeID
SupervisiorID
1
2
2
4
8
6
9
5
6
3
5
10
4
17
3
15
10
20
15
20
17
20
16
21
15
13
14
12
13
11
Excepted output
EmployeeID
SupervisiorID_1
SupervisiorID_2
SupervisiorID_3
SupervisiorID_4
SupervisiorID_5
1
2
4
17
20
20
2
4
17
20
20
20
8
6
3
15
20
20
9
5
10
20
20
20
6
3
15
20
20
20
5
10
20
20
20
20
4
17
20
20
20
20
3
15
20
20
20
20
10
20
20
20
20
20
15
20
20
20
20
20
17
20
20
20
20
20
16
21
21
21
21
21
15
13
11
11
11
11
14
12
12
12
12
12
13
11
11
11
11
11
How can we achieve this in Spark using dataframe recursively.
Although this has been asked many times, someone here https://dwgeek.com/spark-sql-recursive-dataframe-pyspark-and-scala.html/ has solved this.
If you only have 5 levels, than it is better to use 4 joins to do the job.
In my point of view, spark doesn't support natively recursive solutions for such scenario. If you really want to do it in a recursive way, you may need to collect the data u need and do it on driver locally.

How to work on "age bins" in Pandas Dataframe which are saved as string?

I downloaded a dataset in .csv format from kaggle which is about lego. There's a "Ages" column like this:
df['Ages'].unique()
array(['6-12', '12+', '7-12', '10+', '5-12', '8-12', '4-7', '4-99', '4+',
'9-12', '16+', '14+', '9-14', '7-14', '8-14', '6+', '2-5', '1½-3',
'1½-5', '9+', '5-8', '10-21', '8+', '6-14', '5+', '10-16', '10-14',
'11-16', '12-16', '9-16', '7+'], dtype=object)
These categories are the suggested ages for using and playing with the legos.
I'm intended to do some statistical analysis with these age bins. For example, I want to check the mean of these suggested ages.
However, since the type of each of them is string:
type(lego_dataset.loc[0]['Ages'])
str
I don't know how to work on the data.
I've already check How to categorize a range of values in Pandas DataFrame
But imagine there are 100 unique bins. It's not reasonable to prepare a list of 100 labels for each category. There should be a better way.
Not entirely sure what output you are looking for. See if the below code & output helps you.
df['Lage'] = df['Ages'].str.split('[-+]').str[0]
df['Uage'] = df['Ages'].str.split('[-+]').str[-1]
or
df['Lage'] = df['Ages'].str.extract('(\d+)', expand=True) #you don't get the fractions for row 17 & 18
df['Uage'] = df['Ages'].str.split('[-+]').str[-1]
Input
Ages
0 6-12
1 12+
2 7-12
3 10+
4 5-12
5 8-12
6 4-7
7 4-99
8 4+
9 9-12
10 16+
11 14+
12 9-14
13 7-14
14 8-14
15 6+
16 2-5
17 1½-3
18 1½-5
19 9+
20 5-8
21 10-21
22 8+
23 6-14
24 5+
25 10-16
26 10-14
27 11-16
28 12-16
29 9-16
30 7+
Output1
Ages Lage Uage
0 6-12 6 12
1 12+ 12
2 7-12 7 12
3 10+ 10
4 5-12 5 12
5 8-12 8 12
6 4-7 4 7
7 4-99 4 99
8 4+ 4
9 9-12 9 12
10 16+ 16
11 14+ 14
12 9-14 9 14
13 7-14 7 14
14 8-14 8 14
15 6+ 6
16 2-5 2 5
17 1½-3 1½ 3
18 1½-5 1½ 5
19 9+ 9
20 5-8 5 8
21 10-21 10 21
22 8+ 8
23 6-14 6 14
24 5+ 5
25 10-16 10 16
26 10-14 10 14
27 11-16 11 16
28 12-16 12 16
29 9-16 9 16
30 7+ 7
Output2
Ages Lage Uage
0 6-12 6 12
1 12+ 12
2 7-12 7 12
3 10+ 10
4 5-12 5 12
5 8-12 8 12
6 4-7 4 7
7 4-99 4 99
8 4+ 4
9 9-12 9 12
10 16+ 16
11 14+ 14
12 9-14 9 14
13 7-14 7 14
14 8-14 8 14
15 6+ 6
16 2-5 2 5
17 1½-3 1 3
18 1½-5 1 5
19 9+ 9
20 5-8 5 8
21 10-21 10 21
22 8+ 8
23 6-14 6 14
24 5+ 5
25 10-16 10 16
26 10-14 10 14
27 11-16 11 16
28 12-16 12 16
29 9-16 9 16
30 7+ 7

Defining the range of a regression in which the X and Y will change

I need to create a simple regression with the Data Analysis Toolpack. The thing is, the range for Y and X input is always different. To illustrate what I'm trying to say, here's an example of the table I need to work on:
A B C D E F G H I J K L
1 Y T T1 T2 T3 T4 T5 T6 T7 T8 T9 T10
2 19 1
3 13 2 19
4 14 3 13 19
5 16 4 14 13 19
6 17 5 16 14 13 19
7 16 6 17 16 14 13 19
8 20 7 16 17 16 14 13 19
9 10 8 20 16 17 16 14 13 19
10 20 9 10 20 16 17 16 14 13 19
11 11 10 20 10 20 16 17 16 14 13 19
12 11 11 11 20 10 20 16 17 16 14 13 19
13 14 12 11 11 20 10 20 16 17 16 14 13
14 15 13 14 11 11 20 10 20 16 17 16 14
15 17 14 15 14 11 11 20 10 20 16 17 16
16 10 15 17 15 14 11 11 20 10 20 16 17
17 4 16 10 17 15 14 11 11 20 10 20 16
18 15 17 4 10 17 15 14 11 11 20 10 20
19 6 18 15 4 10 17 15 14 11 11 20 10
20 10 19 6 15 4 10 17 15 14 11 11 20
21 16 20 10 6 15 4 10 17 15 14 11 11
22 16 10 6 15 4 10 17 15 14 11
23 16 10 6 15 4 10 17 15 14
24 16 10 6 15 4 10 17 15
25 16 10 6 15 4 10 17
26 16 10 6 15 4 10
27 16 10 6 15 4
28 16 10 6 15
29 16 10 6
30 16 10
31 16
In this example, The Y input would be range A12:A21, that's because the first entry in the last column of the table (the "19" in cell L12) is in row 12 AND The last entry in the first column of the table (the "16" in cell A21) is in row 21; furthermore, The X input would be region B12:L21 for the same reasons.
After doing the first regression, I need to delete two columns out of the table and afterwards do ANOTHER regression. So if, for example I need to delete Columns J and L, the table would look like this:
A B C D E F G H I J
1 Y T T1 T2 T3 T4 T5 T6 T7 T9
2 19 1
3 13 2 19
4 14 3 13 19
5 16 4 14 13 19
6 17 5 16 14 13 19
7 16 6 17 16 14 13 19
8 20 7 16 17 16 14 13 19
9 10 8 20 16 17 16 14 13 19
10 20 9 10 20 16 17 16 14 13
11 11 10 20 10 20 16 17 16 14 19
12 11 11 11 20 10 20 16 17 16 13
13 14 12 11 11 20 10 20 16 17 14
14 15 13 14 11 11 20 10 20 16 16
15 17 14 15 14 11 11 20 10 20 17
16 10 15 17 15 14 11 11 20 10 16
17 4 16 10 17 15 14 11 11 20 20
18 15 17 4 10 17 15 14 11 11 10
19 6 18 15 4 10 17 15 14 11 20
20 10 19 6 15 4 10 17 15 14 11
21 16 20 10 6 15 4 10 17 15 11
22 16 10 6 15 4 10 17 14
23 16 10 6 15 4 10 15
24 16 10 6 15 4 17
25 16 10 6 15 10
26 16 10 6 4
27 16 10 15
28 16 6
29 10
30 16
And now the regression would be with inputs Y (A11:A21) because the first entry in the last column of the table ("19" in cell J11) is in row 11 AND The last entry in the first column of the table ("16" in cell A21) is in row 21. Likewise the X input would be (B11:J21) for the same reasons.
I have tried in a hundred different ways, but no luck. This is the closest I've been to creating what I need, but I'm still lost since it won't execute the regression:
Sub Prueba1()
Range("A1").Select
Selection.End(xlToRight).Select
Selection.End(xlDown).Select
Selection.End(xlToLeft).Select
Application.Run "ATPVBAEN.XLAM!Regress", Range(Selection, Selection.End(xlDown)).Select, _
Range(Selection.Offset(, 1), Selection.End(xlToRight)).Select, False, False, , Range("S1") _
, False, False, False, False, , False
End Sub
This User Defined Function (aka UDF) will return the range into your Application.Run "ATPVBAEN.XLAM!Regress" as a parameter.
Function regress_range()
Dim strAddr As String, c As Long
With Worksheets("Sheet4") '<~~set this worksheet name!
With .Cells(1, 1).CurrentRegion
Set regress_range = .Range(.Cells(.Cells(1, .Columns.Count).End(xlDown).Row, 1), _
.Cells(Application.Match(1E+99, .Columns(1)), .Columns.Count))
End With
End With
End Function
You need to make sure that it is properly referencing the correct worksheet in the third line.
This would become part of the run command like,
Application.Run "ATPVBAEN.XLAM!Regress", regress_range(), False, False, , Range("S1") _
, False, False, False, False, , False
I'm still concerned how Range("S1") may change (i.e. shift right) if columns are deleted from the regression range. Additionally, it has no explicitly referenced parent worksheet.
Output starting at your original data block:
$A$12:$L$21
$A$11:$J$21

Matlab error: Invalid call to strsplit

I am trying to divide a set of three numbers from a string. here is my code:
tline =fgetl(fid);
in_points=fgetl(fid);
B = strrep(in_points,' ',' ')
C = char(strsplit(B));
points = reshape(str2num(C), 3, [])'
My input file looks like this:
output 1 for p=0.01
8 8 1 4 15 1 5 17 1 17 17 1 13 1 2 10 3 2 16 4 2 18 6 2 6 3 3 9 3 3 9 7 3 2 13 3 7 18 3 19 20 3 12 4 4 1 6 4 12 10 5 9 12 5 8 19 5 18 4 6 13 9 6 12 16 6 6 8 7 17 12 7 18 6 8 7 15 8 8 8 9 3 19 9 17 19 9 20 2 10 20 4 10 3 8 10 11 7 11 10 12 11 4 14 11 19 3 12 4 11 12 6 11 12 11 13 12 19 14 12 13 15 12 14 18 12 3 19 12 1 3 13 9 9 13 20 10 13 5 13 13 4 17 13 15 16 14 11 18 14 20 3 15 6 13 15 7 16 15 12 17 15 9 1 16 11 1 16 9 5 16 11 12 16 11 16 16 20 19 16 19 13 17 16 16 17 5 19 17 19 1 18 20 10 18 13 16 18 6 1 19 16 4 19 20 7 19 13 11 19 2 19 19 1 6 20 10 14 20 16 15 20 18 16 20 7 20 20
I want to separate the numbers as
8 8 1
4 15 1
5 17 1
and so on. When I run this code in octave it shows error. Any help will be appreciated.
You're code seems generally fine to me, although as mentioned in Hoki's comment there's probably a cleaner way to do it.
The only error is that you never actually read in the first line of the data. The first fgetl command reads in your title line. The second reads the blank line between the title and your data, instead of what you probably wanted to be in_points
If you add in another fgetl between the tline and in_points lines, it worked for me.
>> points
points =
8 8 1
4 15 1
5 17 1
17 17 1
13 1 2
10 3 2
16 4 2
18 6 2
6 3 3
9 3 3
9 7 3
...
As Hoki mentioned, the B = strrep(in_points,' ',' ') line does nothing but replace a space with a space. not sure what you were trying to do there.

pulling out the result from MATLAB to Excel row by row

I do calculations on 64 elements (for p=1:64 function end) and pull out the result values in an Excel file.
Is there any way to arrange the result values for each element row by row (the values of the first element should appear on the first row, the values of the second element should appear on the second row and so on)?
I used P=reshape(A,[],16) but Matlab pushes the values from right to the left mixing them.
For example,
If I set the loop for the calculation p=1:1 and use P=reshape(A,[],16) the result is:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
If I set p=1:2 the result becomes:
for element 1: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31
for element 2: 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32
(the values of element 2 are: 17 18 19 20 21 22 23 24 25 ... 32)
The result for p=1:2 should be:
for element 1: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
for element 2: 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
for element 3: 33 34 35 ,etc...
Try this:
P=reshape(A,16,[])'
Is this what you need?

Resources