Turning rows into columns based on a value in excel? - excel

I have a bunch of data, ordered below one another in excel (actually openoffice). I need them to be consolidated by the first column, but so that all data are still shown:
From (actually goes until 100):
1 283.038 244
1 279.899 494
1 255.139 992
1 254.606 7329
1 254.5 17145
1 251.008 23278
1 250.723 28758
1 247.753 92703
1 243.43 315278
1 242.928 485029
1 237.475 1226549
1 233.851 2076295
1 232.833 9826327
1 229.656 15965410
1 229.656 30000235
2 286.535 231
2 275.968 496
2 267.927 741
2 262.647 2153
2 258.925 3130
2 253.954 4857
2 249.551 9764
2 244.725 36878
2 243.825 318455
2 242.86 921618
2 238.401 1405028
2 234.984 3170031
2 233.168 4403799
2 229.317 8719139
2 224.395 26986035
2 224.395 30000056
3 269.715 247
3 268.652 469
3 251.214 957
3 249.04 30344
3 245.883 56115
3 241.753 289668
3 241.707 954750
3 240.684 1421766
3 240.178 1865750
3 235.09 2626524
3 233.579 5129755
3 232.517 7018880
3 232.256 18518741
3 228.75 19117443
3 228.75 30000051
to:
1 2 3
283.038 244 286.535 231 269.715 247
279.899 494 275.968 496 268.652 469
255.139 992 267.927 741 251.214 957
254.606 7329 262.647 2153 249.04 30344
254.5 17145 258.925 3130 245.883 56115
251.008 23278 253.954 4857 241.753 289668
250.723 28758 249.551 9764 241.707 954750
247.753 92703 244.725 36878 240.684 1421766
243.43 315278 243.825 318455 240.178 1865750
242.928 485029 242.86 921618 235.09 2626524
237.475 1226549 238.401 1405028 233.579 5129755
233.851 2076295 234.984 3170031 232.517 7018880
232.833 9826327 233.168 4403799 232.256 18518741
229.656 15965410 229.317 8719139 228.75 19117443
229.656 30000235 224.395 26986035 228.75 30000051
224.395 30000056
This must be really simple. But I couldn't find it. I tried a pivot table, but that only allows me to summarise or count etc the fields, while I want them all to be displayed. Any ideas?
To elaborate on the pivot table. I put column 1 as row, column 2 as colum and column 3 in the middle, but that comes out with a lot of empty cells and summarised.
I am not sure on which search terms to look, unconsolidised pivot tables haven't provided an answer.

After some discussions with my collegues, this seemed undoable in excel. So I just created a short script to save each "run" (which is the first column) in a seperate file, from csv:
awk -F, '{print > $1}' rw.txt

Related

How to clean up misaligned columns in text separated by spaces? [duplicate]

This question already has an answer here:
How can I align the columns of a space separated table in Bash? [duplicate]
(1 answer)
Closed 3 years ago.
I have a file with below strings in it which are misaligned. I want to align this file properly so that each of the word in each line are properly spaced.
3 281 901.188.30.53 901001 1 poihelloswqs-1146414
3 598 901.189.166.233 901001 1 poihelloswqs-90877846
3 300 901.156.77.57 901001 1 poihelloswqs-90137229
3 263 901.156.17.80 901001 1 poihelloswqs-90135797
3 264 901.875.875.79 901001 1 poihelloswqs-1389375
3 265 901.189.153.234 901001 1 poihelloswqs-1568332
3 266 901.218.93.873 901001 1 poihelloswqs-3240561
3 268 901.158.76.23 901001 1 poihelloswqs-3242066
3 269 901.218.30.120 901001 1 poihelloswqs-3242532
It should output something like this: This way they all are properly aligned. Is this possible to do in Linux?
3 281 901.188.30.53 901001 1 poihelloswqs-1146414
3 598 901.189.166.233 901001 1 poihelloswqs-90877846
3 300 901.156.77.57 901001 1 poihelloswqs-90137229
3 263 901.156.17.80 901001 1 poihelloswqs-90135797
3 264 901.875.875.79 901001 1 poihelloswqs-1389375
3 265 901.189.153.234 901001 1 poihelloswqs-1568332
3 266 901.218.93.873 901001 1 poihelloswqs-3240561
3 268 901.158.76.23 901001 1 poihelloswqs-3242066
3 269 901.218.30.120 901001 1 poihelloswqs-3242532
It's very ugly but this may work:
cat xx | ruby -nle 'puts $_.split().join("\t")' |pr --expand-tabs -tT
You may need to replace the single \t with multiple tab characters if your columns vary greatly in width.
This is ruby splitting the line on any white space, then joining using tabs. Finally pr is replacing tabs with spaces (and the -tT avoids the annoying headers and pagination)
If you know the fields are separated by a single character tr can do the replacement easily:
cat xx | tr ' ' "\t" |pr --expand-tabs -tT

How can I swap numbers inside data block of repeating format using linux commands?

I have a huge data file, and I hope to swap some numbers of 2nd column only, in the following format file. The file have 25,000,000 dataset, and 8768 lines each.
%% Edited: shorter 10 line example. Sorry for the inconvenience. This is typical one data block.
# Dataset 1
#
# Number of lines 10
#
# header lines
5 11 3 10 120 90 0 0.952 0.881 0.898 2.744 0.034 0.030
10 12 3 5 125 112 0 0.952 0.897 0.905 2.775 0.026 0.030
50 10 3 48 129 120 0 1.061 0.977 0.965 3.063 0.001 0.026
120 2 4 5 50 186 193 0 0.881 0.965 0.899 0.917 3.669 0.000 -0.005
125 3 4 10 43 186 183 0 0.897 0.945 0.910 0.883 3.641 0.000 0.003
186 5 4 120 125 249 280 0 0.899 0.910 0.931 0.961 3.727 0.000 -0.001
193 6 4 120 275 118 268 0 0.917 0.895 0.897 0.937 3.799 0.000 0.023
201 8 4 278 129 131 280 0 0.921 0.837 0.870 0.934 3.572 0.000 0.008
249 9 4 186 355 179 317 0 0.931 0.844 0.907 0.928 3.615 0.000 0.008
280 10 4 186 201 340 359 0 0.961 0.934 0.904 0.898 3.700 0.000 0.033
#
# Dataset 1
#
# Number of lines 10
...
As you can see, there are 7 repeating header lines in the head, and 1 trailing line at the end of the dataset. Those header and trailing lines are all beginning from #. As a result, the data will have 7 header lines, 8768 data lines, and 1 trailing line, total 8776 lines per a data block. That one trailing line only contains sinlge '#'.
I want to swap some numbers in 2nd columns only. First, I want to replace
1, 9, 10, 11 => 666
2, 6, 7, 8 => 333
3, 4, 5 => 222
of the 2nd column, and then,
666 => 6
333 => 3
222 => 2
of the 2nd column. I hope to conduct this replacing for all repeating dataset.
I tried this with python, but the data is too big, so it makes memory error. How can I perform this swapping with linux commands like sed or awk or cat commands?
Thanks
Best,
This might work for you, but you'd have to use GNU awk, as it's using the gensub command and $0 reassignment.
Put the following into an executable awk file ( like script.awk ):
#!/usr/bin/awk -f
BEGIN {
a[1] = a[9] = a[10] = a[11] = 6
a[2] = a[6] = a[7] = a[8] = 3
a[3] = a[4] = a[5] = 2
}
function swap( c2, val ) {
val = a[c2]
return( val=="" ? c2 : val )
}
/^( [0-9]+ )/ { $0 = gensub( /^( [0-9]+)( [0-9]+)/, "\\1 " swap($2), 1 ) }
47 # print the line
Here's the breakdown:
BEGIN - set up an array a with mappings of the new values.
create a user defined function swap to provide values for the 2nd column from the a array or the value itself. The c2 element is passed in, while the val element is a local variable ( becuase no 2nd argument is passed in ).
when a line starts with a space followed by a number and a space (the pattern), then use gensub to replace the first occurrance of the first number pattern with itself concatenated with a space and the return from swap(the action). In this case, I'm using gensub's replacement text to preserve the first column data. The second column is passed to swap using the field data identifier of $2. Using gensub should preserve the formatting of the data lines.
47 - an expression that evaluates to true provides the default action of printing $0, which for data lines might have been modified. Any line that wasn't "data" will be printed out here w/o modifications.
The provided data doesn't show all the cases, so I made up my own test file:
# 2 skip me
9 2 not going to process me
1 1 don't change the for matting
2 2 4 23242.223 data
3 3 data that's formatted
4 4 7 that's formatted
5 5 data that's formatted
6 6 data that's formatted
7 7 data that's formatted
8 8 data that's formatted
9 9 data that's formatted
10 10 data that's formatted
11 11 data that's formatted
12 12 data that's formatted
13 13 data that's formatted
14 s data that's formatted
# some other data
Running the executable awk (like ./script.awk data) gives the following output:
# 2 skip me
9 2 not going to process me
1 6 don't change the for matting
2 3 4 23242.223 data
3 2 data that's formatted
4 2 7 that's formatted
5 2 data that's formatted
6 3 data that's formatted
7 3 data that's formatted
8 3 data that's formatted
9 6 data that's formatted
10 6 data that's formatted
11 6 data that's formatted
12 12 data that's formatted
13 13 data that's formatted
14 s data that's formatted
# some other data
which looks alright to me, but I'm not the one with 25 million datasets.
You'd also most definitely want to try this on a smaller sample of your data first (the first few datasets?) and redirect stdout a temp file perhaps like:
head -n 26328 data | ./script.awk - > tempfile
You can learn more about the elements used in this script here:
awk basics (the man page)
Arrays
User defined functions
String functions - gensub()
And of course, you should spend some quality time reviewing awk related questions and answers on Stack Overflow ;)

sort multiple column file

I have a file a.dat as following.
1 0.246102 21 1 0.0408359 0.00357267
2 0.234548 21 2 0.0401056 0.00264361
3 0.295771 21 3 0.0388905 0.00305116
4 0.190543 21 4 0.0371858 0.00427217
5 0.160047 21 5 0.0349674 0.00713894
I want to sort the file according to values in second column. i.e. output should look like
5 0.160047 21 5 0.0349674 0.00713894
4 0.190543 21 4 0.0371858 0.00427217
2 0.234548 21 2 0.0401056 0.00264361
1 0.246102 21 1 0.0408359 0.00357267
3 0.295771 21 3 0.0388905 0.00305116
How can do this with command line?. I read that sort command can be used for this purpose. But I could not figure out how to use sort command for this.
Use sort -k to indicate the column you want to use:
$ sort -k2 file
5 0.160047 21 5 0.0349674 0.00713894
4 0.190543 21 4 0.0371858 0.00427217
2 0.234548 21 2 0.0401056 0.00264361
1 0.246102 21 1 0.0408359 0.00357267
3 0.295771 21 3 0.0388905 0.00305116
This makes it in this case.
For future references, note (as indicated by 1_CR) that you can also indicate the range of columns to be used with sort -k2,2 (just use column 2) or sort -k2,5 (from 2 to 5), etc.
Note that you need to specify the start and end fields for sorting (2 and 2 in this case), and if you need numeric sorting, add n.
sort -k2,2n file.txt

How can I align columns where the biggest number or greatest string is the align indicator?

How can I right align (and left align?) a block of numbers or text in vim like this:
from:
45 209 25 1
2 4 2 3
34 5 300 5
34 120 34 12
to this:
45 209 25 1
2 4 2 3
34 5 300 5
34 120 34 12
That means the biggest number or greatest string in every column doesn't move.
In the first column it is 45+34, in the second column 209+120, in the third column 300 and in the last column 12.
Have a look at the align plugin, it can do this and much more. Great tool in your utility belt!
Found here
After some serious vimhelp/reading I found the correct AlignCtrl mapping...
Visually select the table, e.g. by using ggVG, then do a \Tsp i.e. <leader>Tsp
Then I get this:
45 209 25 1
2 4 2 3
34 5 300 5
34 120 34 12
From vimhelp:
\Tsp : use Align to make a table separated by blanks |alignmap-Tsp|
(right justified)
You can look into the Tabularize plugin. So if you have something like
45 209 25 1
2 4 2 3
34 5 300 5
34 120 34 12
just select those lines in the visual mode and type :Tab/ and it will format it as
45 209 25 1
2 4 2 3
34 5 300 5
34 120 34 12
Also, it looks like you don't have an equal number of spaces separating the numbers at the moment. So before you use the plugin, replace all the multiple spaces with a single space with the following regex:
%s![^ ]\zs \+! !g
With the Align plugin you can select the rows you want to align and hit :
<Leader>Tsp
From Align.txt
\Tsp : use Align to make a table separated by blanks |alignmap-Tsp|
(right justified)
(The help mention \ because it is the default leader but in case you have changed it to something else you must adapt accordingly)
Just trying on my install, I got the following result :
45 209 25 1
2 4 2 3
34 5 300 5
34 120 34 12
In my opinion Align plugin is great but the "align maps" and various commands are not really easy to remember.
With the Align and AlignMaps plugins: select using V, then \anum (AlignMaps comes with Align). One advantage of \anum is that it also handles decimal points (commas) and scientific notation.
I think the best thing to do is to first eat all multiple spaces with
:{range}s/ \+/ /g
And then call Tabularize
:Tab / /r1
Or change that r to l.

Excel date/product count to specified limit

Column A "Sales Dates", Column B "=A2-A1" for "Date Diff", Column C "Customer Name", Column D "Item", Column E "Items Ordered Count"
My issue is I have to do a running 30 day total for each customer to see that specific items are not being ordered above "x" number within any 30-day period.
Does anyone have any ideas?
I may not be fully understanding your question, but I don't think you can do what you ask in excel. This might be a situation where a database that can do SQL might come in handy.
The best I can come up with in excel is a Pivot Table, with the customers as rows, dates as columns (group by month), and sum of Items Ordered in the data area. Then conditional format the data area to highlight values > your limit.
Perhaps if you provide some sample data & output I can come up with something more like what you need.
The formula would look something like this:
{=SUM(IF((A$2:A2>=A2-29)*(D$2:D2=D2),E$2:E2,0))}
It should be entered into cell F2 and copied down to the last row of your data. I pasted in a test spreadsheet below so you can see where things go (sorry for the formatting--hopefully it will look better if you paste it into Excel).
IMPORTANT: This is an array formula, so after you type in the formula (and don't type in the braces {} when you do), you must press Ctrl-Shift-Enter instead of just Enter (see this link for more details).
What does the formula do? It does two loops:
First, it loops through all the Sales Dates from the beginning of the log to the current row and checks if each date is between the date of the current row and 29 days earlier (which makes a 30-day window). (By "current row" I mean the row where the formula is located.)
Second, it loops through all the Items from the beginning of the log to the current row and checks if there is a match with the Item of the current row.
For any row where both checks are true (the "*" in the formula does an "and" operation), Items Ordered Count is added to the sum, otherwise zero is added to the sum. So, when it's finished, you have a count for each row of how many orders there were in the past 30 days for that item.
HTH,
-Dan
Sales Dates Date Diff Customer Name Item Items Ordered Count 30-Day Count
1/1/2009 0 dfsadf 11336 70 70
1/2/2009 1 asdfd 10218 121 121
1/3/2009 1 fsdfjkfl 10942 101 101
1/6/2009 3 slkdjflsk 13710 80 80
1/7/2009 1 slkdjls 10480 127 127
1/9/2009 2 sdjjf 11336 143 213
1/11/2009 2 woieuriwe 11501 84 84
1/14/2009 3 owqieyurtn 10191 78 78
1/15/2009 1 weisd 10480 113 240
1/16/2009 1 woieuriwe 12024 133 133
1/17/2009 1 vkcjl 13818 125 125
1/20/2009 3 sdflkj 11336 128 341
1/23/2009 3 jnbkdl 10480 141 381
1/25/2009 2 pqcvnlz 10480 137 518
1/27/2009 2 hwodkjgfh 12878 80 80
1/28/2009 1 zjdnfg;pwlkd 10942 123 224
1/31/2009 3 zlkdjnf;psod 13173 93 93
2/2/2009 2 zlknpdodfg 11336 119 390
2/4/2009 2 zjhdfpwskjh 12004 57 57
2/5/2009 1 asdfd 10218 121 121
2/8/2009 3 fsdfjkfl 10942 101 224
2/11/2009 3 slkdjflsk 13710 80 80
2/14/2009 3 slkdjls 10480 127 405
2/16/2009 2 sdjjf 11336 143 390
2/18/2009 2 woieuriwe 11501 84 84
2/21/2009 3 owqieyurtn 10191 78 78
2/24/2009 3 weisd 10480 113 240
2/25/2009 1 woieuriwe 12024 133 133
2/27/2009 2 vkcjl 13818 125 125
2/28/2009 1 sdflkj 11336 128 390

Resources