Seeming inconsistency in the way transpose |: works - j

Consider:
|: 2 3 $ 1 2 3
1 1
2 2
3 3
|: 1 2 3
1 2 3
The first one makes sense to me: the rows are now columns. But, by analogy, I expected the output of the 2nd one to be:
|: 1 2 3
1
2
3
Why is it still a row, rather than a column?

|:
reverses the order of the axes of its argument
So
$ |: 2 3 $ 1 2 3
3 2
$ |: 1 2 3 $ 1 2 3
3 2 1
and naturally
$ |: 1 2 3
3
which is the list 1 2 3
The result that you expected has axes 3 1; you would get this for the transpose of the list 1 3 $ 1 2 3
] l =: 1 3 $ 1 2 3
1 2 3
|: l
1
2
3
($ l);($ |: l)
┌───┬───┐
│1 3│3 1│
└───┴───┘

Related

Pandas how to turn each group into a dataframe using groupby

I have a dataframe looks like,
A B
1 2
1 3
1 4
2 5
2 6
3 7
3 8
If I df.groupby('A'), how do I turn each group into sub-dataframes, so it will look like, for A=1
A B
1 2
1 3
1 4
for A=2,
A B
2 5
2 6
for A=3,
A B
3 7
3 8
By using get_group
g=df.groupby('A')
g.get_group(1)
Out[367]:
A B
0 1 2
1 1 3
2 1 4
You are close, need convert groupby object to dictionary of DataFrames:
dfs = dict(tuple(df.groupby('A')))
print (dfs[1])
A B
0 1 2
1 1 3
2 1 4
print (dfs[2])
A B
3 2 5
4 2 6

Split one column by every into "n" columns with one character each

I have a file with one single column and 10 rows. Every column has the same number of characters (5). From this file I would like to get a file with 10 rows and 5 columns, where each column has 1 character only. I have no idea on how to do that in linux.. Any help? Would AWK do this?
The real data has many more rows (>4K) and characters (>500K) though. Here is a short version of the real data:
31313
30442
11020
12324
00140
34223
34221
43124
12211
04312
Desired output:
3 1 3 1 3
3 0 4 4 2
1 1 0 2 0
1 2 3 2 4
0 0 1 4 0
3 4 2 2 3
3 4 2 2 1
4 3 1 2 4
1 2 2 1 1
0 4 3 1 2
Thanks!
I think that this does what you want:
$ awk -F '' '{ $1 = $1 }1' file
3 1 3 1 3
3 0 4 4 2
1 1 0 2 0
1 2 3 2 4
0 0 1 4 0
3 4 2 2 3
3 4 2 2 1
4 3 1 2 4
1 2 2 1 1
0 4 3 1 2
The input field separator is set to the empty string, so every character is treated as a field. $1 = $1 means that awk "touches" every record, causing it to be reformatted, inserting the output field separator (a space) between every character. 1 is the shortest "true" condition, causing awk to print every record.
Note that the behaviour of setting the field separator to an empty string isn't well-defined, so may not work on your version of awk. You may find that setting the field separator differently e.g. by using -v FS= works for you.
Alternatively, you can do more or less the same thing in Perl:
perl -F -lanE 'say "#F"' file
-a splits each input record into the special array #F. -F followed by nothing sets the input field separator to the empty string. The quotes around #F mean that the list separator (a space by default) is inserted between each element.
You can use this sed as well:
sed 's/./& /g; s/ $//' file
3 1 3 1 3
3 0 4 4 2
1 1 0 2 0
1 2 3 2 4
0 0 1 4 0
3 4 2 2 3
3 4 2 2 1
4 3 1 2 4
1 2 2 1 1
0 4 3 1 2
Oddly enough, this isn't trivial to do with most standard Unix tools (update: except, apparently, with awk). I would use Python:
python -c 'import sys; map(sys.stdout.write, map(" ".join, sys.stdin))' in.txt > new.txt
(This isn't the greatest idiomatic Python, but it suffices for a simple one-liner.)
another unix toolchain for this task
$ while read line; do echo $line | fold -w1 | xargs; done < file
3 1 3 1 3
3 0 4 4 2
1 1 0 2 0
1 2 3 2 4
0 0 1 4 0
3 4 2 2 3
3 4 2 2 1
4 3 1 2 4
1 2 2 1 1
0 4 3 1 2

how to calculate standard deviation from different colums in shell script

I have a datafile with 10 columns as given below
ifile.txt
2 4 4 2 1 2 2 4 2 1
3 3 1 5 3 3 4 5 3 3
4 3 3 2 2 1 2 3 4 2
5 3 1 3 1 2 4 5 6 8
I want to add 11th column which will show the standard deviation of each rows along 10 columns. i.e. STDEV(2 4 4 2 1 2 2 4 2 1) and so on.
I am able to do by taking tranpose, then using the following command and again taking transpose
awk '{x[NR]=$0; s+=$1} END{a=s/NR; for (i in x){ss += (x[i]-a)^2} sd = sqrt(ss/NR); print sd}'
Can anybody suggest a simpler way so that I can do it directly along each row.
You can do the same with one pass as well.
awk '{for(i=1;i<=NF;i++){s+=$i;ss+=$i*$i}m=s/NF;$(NF+1)=sqrt(ss/NF-m*m);s=ss=0}1' ifile.txt
Do you mean something like this ?
awk '{for(i=1;i<=NF;i++)s+=$i;M=s/NF;
for(i=1;i<=NF;i++)sd+=(($i-M)^2);$(NF+1)=sqrt(sd/NF);M=sd=s=0}1' file
2 4 4 2 1 2 2 4 2 1 1.11355
3 3 1 5 3 3 4 5 3 3 1.1
4 3 3 2 2 1 2 3 4 2 0.916515
5 3 1 3 1 2 4 5 6 8 2.13542
You just use the fields instead of transposing and using the rows.

Reverse sort order of a multicolumn file in BASH

I've the following file:
1 2 3
1 4 5
1 6 7
2 3 5
5 2 1
and I want that the file be sorted for the second column but from the largest number (in this case 6) to the smallest. I've tried with
sort +1 -2 file.dat
but it sorts in ascending order (rather than descending).
The results should be:
1 6 7
1 4 5
2 3 5
5 2 1
1 2 3
sort -nrk 2,2
does the trick.
n for numeric sorting, r for reverse order and k 2,2 for the second column.
Have you tried -r ? From the man page:
-r, --reverse
reverse the result of comparisons
As mention most version of sort have the -r option if yours doesn't try tac:
$ sort -nk 2,2 file.dat | tac
1 6 7
1 4 5
2 3 5
5 2 1
1 2 3
$ sort -nrk 2,2 file.dat
1 6 7
1 4 5
2 3 5
5 2 1
1 2 3
tac - concatenate and print files in reverse

How can I implement a grouping algorithm in J?

I'm trying to implement A006751 in J. It's pretty easy to do in Haskell, something like:
concat . map (\g -> concat [show $ length g, [g !! 0]]) . group . show
(Obviously that's not complete, but it's the basic heart of it. I spent about 10 seconds on that, so treat it accordingly.) I can implement any of this fairly easily in J, but the part that eludes me is a good, idiomatic J algorithm that corresponds to Haskell's group function. I can write a clumsy one, but it doesn't feel like good J.
Can anyone implement Haskell's group in good J?
Groups are usually done with the /. adverb.
1 1 2 1 </. 'abcd'
┌───┬─┐
│abd│c│
└───┴─┘
As you can see, it's not sequential. Just make your key sequential like so (essentially determining if an item is different from the next, and do a running sum of the resulting 0's and 1's):
neq =. 13 : '0, (}. y) ~: (}: y)'
seqkey =. 13 : '+/\neq y'
(seqkey 1 1 2 1) </. 'abcd'
┌──┬─┬─┐
│ab│c│d│
└──┴─┴─┘
What I need then is a function which counts the items (#), and tells me what they are ({. to just pick the first). I got some inspiration from nubcount:
diffseqcount =. 13 : ',(seqkey y) (#,{.)/. y'
diffseqcount 2
1 2
diffseqcount 1 2
1 1 1 2
diffseqcount 1 1 1 2
3 1 1 2
If you want the nth result, just use power:
diffseqcount(^:10) 2 NB. 10th result
1 3 2 1 1 3 2 1 3 2 2 1 1 3 3 1 1 2 1 3 2 1 2 3 2 2 2 1 1 2
I agree that /. ( Key ) is the best general method for applying verbs to groups in J. An alternative in this case, where we need to group consecutive numbers that are the same, is dyadic ;. (Cut):
1 1 0 0 1 0 1 <(;.1) 3 1 1 1 2 2 3
┌─┬─────┬───┬─┐
│3│1 1 1│2 2│3│
└─┴─────┴───┴─┘
We can form the frets to use as the left argument as follows:
1 , 2 ~:/\ 3 1 1 1 2 2 3 NB. inserts ~: in the running sets of 2 numbers
1 1 0 0 1 0 1
Putting the two together:
(] <;.1~ 1 , 2 ~:/\ ]) 3 1 1 1 2 2 3
┌─┬─────┬───┬─┐
│3│1 1 1│2 2│3│
└─┴─────┴───┴─┘
Using the same mechanism as suggested previously:
,#(] (# , {.);.1~ 1 , 2 ~:/\ ]) 3 1 1 1 2 2 3
1 3 3 1 2 2 1 3
If you are looking for a nice J implementation of the look-and-say sequence then I'd suggest the one on Rosetta Code:
las=: ,#((# , {.);.1~ 1 , 2 ~:/\ ])&.(10x&#.inv)#]^:(1+i.#[)
5 las 1 NB. left arg is sequence length, right arg is starting number
11 21 1211 111221 312211

Resources