How to get ordered, defined or all columns except or after or before a given column - linux
In BASH
I run the following one liner to get an individual column/field after splitting on a given character (one can use AWK as well if they want to split on more than one char i.e. on a word in any order, ok).
#This will give me first column i.e. 'lori' i.e. first column/field/value after splitting the line / string on a character '-' here
echo "lori-chuck-shenzi" | cut -d'-' -f1
# This will give me 'chuck'
echo "lori-chuck-shenzi" | cut -d'-' -f2
# This will give me 'shenzi'
echo "lori-chuck-shenzi" | cut -d'-' -f3
# This will give me 'chuck-shenzi' i.e. all columns after 2nd and onwards.
echo "lori-chuck-shenzi" | cut -d'-' -f2-
Notice the last command above, How can I do the same last cut command shit in Groovy?
For ex: if the contents are in a file and they look like:
1 - a
2 - b
3 - c
4 - d
5 - e
6 - lori-chuck shenzi
7 - columnValue1-columnValue2-columnValue3-ColumnValue4
I tried the following Groovy code, but it's not giving me lori-chuck shenzi (i.e. after ignoring the 6th bullet and first occurence of the -, I want my output to be lori-chuck shenzi and the following script is returning me just lori (which is givning me the correct output as my index is [1] in the following code, so I know that).
def file = "/path/to/my/file.txt"
File textfile= new File(file)
//now read each line from the file (using the file handle we created above)
textfile.eachLine { line ->
//list.add(line.split('-')[1])
println "Bullet entry full value is: " + line.split('-')[1]
}
// return list
Also, is there an easy way for the last line in the file above, if I can use Groovy code to change the order of the columns after they are split i.e. reverse the order like we do in Python [1:], [:1], [:-1] etc.. or in some fashion
I don't like this solution but I did this to get it working. After getting index values from [1..-1 (i.e. from 1st index, excluding the 0th index which is the left hand side of first occurrence of - character), I had to remove the [ and ] (LIST) using join(',') and then replacing any , with a - to get the final result what I was looking for.
list.add(line.split('-')[1..-1].join(',').replaceAll(',','-'))
I would still like to know what's a better solution and how can this work when we talk about cherry picking individual columns + in a given order (instead of me writing various Groovy statements to pick individual elements from the string/list per statement).
If I'm understanding your question correctly, what you want is:
line.split('-')[1..-1]
This will give you from position 1 to the last. You can do -2 (next to last) and so on, but just be aware that you can get an ArrayIndexOutOfBoundsException moving backwards too, if you go past the beginning of your array!
-- Original answer is above this line --
Adding to my answer, since comments don't allow code formatting. If all you want is to pick specific columns, and you want a string in the end, you could do something like:
def resultList = line.split('-')
def resultString = "${resultList[1]}-${resultList[2]} ${resultList[3]}"
and pick whatever columns you want that way. I thought you were looking for a more generic solution, but if not, specific columns are easy!
If you want the first value, a dash, then the rest joined by spaces, just use:
"${resultList[1]}-${resultList[2..-1].join(" ")}"
I don't know how to give you specific answers for every combination you might want, but basically once you have your values in a list, you can manipulate that however you want, and turn the results back into a string with GStrings or with .join(...).
Related
split String Variable in few numeric Variables in SPSS
I have a string variable with comma separated numbers that I want to split into four numeric variables. makeArr var1a var1b var1c var1d 6,8,13,10 6 8 13 10 10,11,2 10 11 2 7,1,14,3 7 1 14 3 With: IF (CHAR.INDEX(makeArr,',') >= 1) f12a=CHAR.SUBSTR(makeArr,1,CHAR.INDEX(makeArr,',')-1). EXECUTE. IF (CHAR.INDEX(makeArr,',') >= 1) f12b=CHAR.SUBSTR(makeArr,CHAR.INDEX(makeArr,',')+1,CHAR.INDEX(makeArr,',')-1). EXECUTE. I always get the first variable written without any problems. This no longer works with the second variable because it has a different length and the comma is also written here. So I would need a split at the comma and the division of the numbers over the comma.
Since char.substr will only tell you about the location of the first occurence of the search string, you need to start the second search from a new location - AFTER the first occurence, and this gets more and more complicated as you continue. My suggestion is create a copy of your array variable, which you will cut pieces off as you proceed - so that you are only searching for the first occurence of "," every time. First I recreate your example data to demonstrate on. data list free/makeArr (a20). begin data "6,8,13,10" "10,11,2" "7,1,14,3" end data. Now I copy your array into a new variable #tmp. Note that I add a "," at the end so the syntax stays the same for all parts of the array. I add the "#" at the beginning of the name to make it invisible, you can remove it if you want. It is possible to do the following calculation in steps as you started to do, but nicer to loop throug the steps (especially if this is an example for a longer array). string f12a f12b f12c f12d #tmp (a20). compute #tmp=concat(rtrim(makeArr),","). do repeat nwvr=f12a f12b f12c f12d. do IF #tmp<>"". compute nwvr=CHAR.SUBSTR(#tmp,1,CHAR.INDEX(#tmp,',')-1). compute #tmp=CHAR.SUBSTR(#tmp,CHAR.INDEX(#tmp,',')+1). end if. end repeat. EXECUTE.
Here I found a different solution for what I think is the same problem: https://www.ibm.com/mysupport/s/question/0D50z00006PsP3tCAF/splitting-a-string-variable-divided-by-commas-into-new-single-variables?language=es One line of code makes the work: spssinc trans result=var_1 to var_4 type=20/formula 're.split(", *", makeArr)'.
Simple sorting in linux
I'm quite new to linux, and I can't quite get to understand sorting. I need to sort a long file by column 4 and then column 5, ignoring the first line. The catch is, there are two separators - '.' and ',' - I don't know how to make sort command to include both of them. I guess it has to be sorted by the column that has "3" in the first line, and then in the second sort by the column that has "5" in the second line. And the second thing is I don't know how to keep the first line intact. Worth noting I can't change all ',' into '.', it has to stay intact. And I can't just remove the first line with tail or head, it has to stay. This is the text: d,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species 1,5.1,3.5,1.4,0.2,Iris-setosa 2,4.9,3.0,1.4,0.2,Iris-setosa 3,4.7,3.2,1.3,0.2,Iris-setosa 4,4.6,3.1,1.5,0.2,Iris-setosa 5,5.0,3.6,1.4,0.2,Iris-setosa 6,5.4,3.9,1.7,0.4,Iris-setosa 7,4.6,3.4,1.4,0.3,Iris-setosa 8,5.0,3.4,1.5,0.2,Iris-setosa 9,4.4,2.9,1.4,0.2,Iris-setosa 10,4.9,3.1,1.5,0.1,Iris-setosa 11,5.4,3.7,1.5,0.2,Iris-setosa 12,4.8,3.4,1.6,0.2,Iris-setosa 13,4.8,3.0,1.4,0.1,Iris-setosa 14,4.3,3.0,1.1,0.1,Iris-setosa 15,5.8,4.0,1.2,0.2,Iris-setosa 16,5.7,4.4,1.5,0.4,Iris-setosa 17,5.4,3.9,1.3,0.4,Iris-setosa 18,5.1,3.5,1.4,0.3,Iris-setosa 19,5.7,3.8,1.7,0.3,Iris-setosa 20,5.1,3.8,1.5,0.3,Iris-setosa 21,5.4,3.4,1.7,0.2,Iris-setosa 22,5.1,3.7,1.5,0.4,Iris-setosa 23,4.6,3.6,1.0,0.2,Iris-setosa 24,5.1,3.3,1.7,0.5,Iris-setosa 25,4.8,3.4,1.9,0.2,Iris-setosa 26,5.0,3.0,1.6,0.2,Iris-setosa 27,5.0,3.4,1.6,0.4,Iris-setosa 28,5.2,3.5,1.5,0.2,Iris-setosa 29,5.2,3.4,1.4,0.2,Iris-setosa 30,4.7,3.2,1.6,0.2,Iris-setosa 31,4.8,3.1,1.6,0.2,Iris-setosa 32,5.4,3.4,1.5,0.4,Iris-setosa 33,5.2,4.1,1.5,0.1,Iris-setosa 34,5.5,4.2,1.4,0.2,Iris-setosa 35,4.9,3.1,1.5,0.1,Iris-setosa 36,5.0,3.2,1.2,0.2,Iris-setosa 37,5.5,3.5,1.3,0.2,Iris-setosa 38,4.9,3.1,1.5,0.1,Iris-setosa 39,4.4,3.0,1.3,0.2,Iris-setosa 40,5.1,3.4,1.5,0.2,Iris-setosa 41,5.0,3.5,1.3,0.3,Iris-setosa 42,4.5,2.3,1.3,0.3,Iris-setosa 43,4.4,3.2,1.3,0.2,Iris-setosa 44,5.0,3.5,1.6,0.6,Iris-setosa 45,5.1,3.8,1.9,0.4,Iris-setosa 46,4.8,3.0,1.4,0.3,Iris-setosa 47,5.1,3.8,1.6,0.2,Iris-setosa 48,4.6,3.2,1.4,0.2,Iris-setosa 49,5.3,3.7,1.5,0.2,Iris-setosa 50,5.0,3.3,1.4,0.2,Iris-setosa 51,7.0,3.2,4.7,1.4,Iris-versicolor 52,6.4,3.2,4.5,1.5,Iris-versicolor 53,6.9,3.1,4.9,1.5,Iris-versicolor 54,5.5,2.3,4.0,1.3,Iris-versicolor 55,6.5,2.8,4.6,1.5,Iris-versicolor 56,5.7,2.8,4.5,1.3,Iris-versicolor 57,6.3,3.3,4.7,1.6,Iris-versicolor 58,4.9,2.4,3.3,1.0,Iris-versicolor 59,6.6,2.9,4.6,1.3,Iris-versicolor 60,5.2,2.7,3.9,1.4,Iris-versicolor 61,5.0,2.0,3.5,1.0,Iris-versicolor 62,5.9,3.0,4.2,1.5,Iris-versicolor 63,6.0,2.2,4.0,1.0,Iris-versicolor 64,6.1,2.9,4.7,1.4,Iris-versicolor 65,5.6,2.9,3.6,1.3,Iris-versicolor 66,6.7,3.1,4.4,1.4,Iris-versicolor 67,5.6,3.0,4.5,1.5,Iris-versicolor 68,5.8,2.7,4.1,1.0,Iris-versicolor 69,6.2,2.2,4.5,1.5,Iris-versicolor 70,5.6,2.5,3.9,1.1,Iris-versicolor 71,5.9,3.2,4.8,1.8,Iris-versicolor 72,6.1,2.8,4.0,1.3,Iris-versicolor 73,6.3,2.5,4.9,1.5,Iris-versicolor 74,6.1,2.8,4.7,1.2,Iris-versicolor 75,6.4,2.9,4.3,1.3,Iris-versicolor 76,6.6,3.0,4.4,1.4,Iris-versicolor 77,6.8,2.8,4.8,1.4,Iris-versicolor 78,6.7,3.0,5.0,1.7,Iris-versicolor 79,6.0,2.9,4.5,1.5,Iris-versicolor 80,5.7,2.6,3.5,1.0,Iris-versicolor 81,5.5,2.4,3.8,1.1,Iris-versicolor 82,5.5,2.4,3.7,1.0,Iris-versicolor 83,5.8,2.7,3.9,1.2,Iris-versicolor 84,6.0,2.7,5.1,1.6,Iris-versicolor 85,5.4,3.0,4.5,1.5,Iris-versicolor 86,6.0,3.4,4.5,1.6,Iris-versicolor 87,6.7,3.1,4.7,1.5,Iris-versicolor 88,6.3,2.3,4.4,1.3,Iris-versicolor 89,5.6,3.0,4.1,1.3,Iris-versicolor 90,5.5,2.5,4.0,1.3,Iris-versicolor 91,5.5,2.6,4.4,1.2,Iris-versicolor 92,6.1,3.0,4.6,1.4,Iris-versicolor 93,5.8,2.6,4.0,1.2,Iris-versicolor 94,5.0,2.3,3.3,1.0,Iris-versicolor 95,5.6,2.7,4.2,1.3,Iris-versicolor 96,5.7,3.0,4.2,1.2,Iris-versicolor 97,5.7,2.9,4.2,1.3,Iris-versicolor 98,6.2,2.9,4.3,1.3,Iris-versicolor 99,5.1,2.5,3.0,1.1,Iris-versicolor 100,5.7,2.8,4.1,1.3,Iris-versicolor 101,6.3,3.3,6.0,2.5,Iris-virginica 102,5.8,2.7,5.1,1.9,Iris-virginica 103,7.1,3.0,5.9,2.1,Iris-virginica 104,6.3,2.9,5.6,1.8,Iris-virginica 105,6.5,3.0,5.8,2.2,Iris-virginica 106,7.6,3.0,6.6,2.1,Iris-virginica 107,4.9,2.5,4.5,1.7,Iris-virginica 108,7.3,2.9,6.3,1.8,Iris-virginica 109,6.7,2.5,5.8,1.8,Iris-virginica 110,7.2,3.6,6.1,2.5,Iris-virginica 111,6.5,3.2,5.1,2.0,Iris-virginica 112,6.4,2.7,5.3,1.9,Iris-virginica 113,6.8,3.0,5.5,2.1,Iris-virginica 114,5.7,2.5,5.0,2.0,Iris-virginica 115,5.8,2.8,5.1,2.4,Iris-virginica 116,6.4,3.2,5.3,2.3,Iris-virginica 117,6.5,3.0,5.5,1.8,Iris-virginica
I'm not sure if I understand the problem: in order to sort on columns 4 and 5 numerically, you can simply say: sort -t, -k4,5 -n -t, says to use a comma as a column separator -k4,5 says to sort, based on columns 4 and 5 -n says to sort numerically Remarks: As far as the hyphen concerns: as it's no part of the sorting columns, why bother about it? As far as the column headers are concerned: due to the type of sorting, a letter comes in front of any number, so the column headers stay headers, so again why bother about it?
Sorting multiple keys with Unix sort -- Bug?
I'm trying to sort my data by multiple keys with unix sort. I think that I get a wrong result. My command is sort -t "_" -k4,4 -k2 -k1,1g < stdev.txt And the result: 0.322_rsrc:15_phi:0.5_abr:1_prof:gauss_diff:lap2.dat 0.000110687417806 0.0346076270248 0.3_rsrc:15_phi:0.5_abr:1_prof:gauss_diff:lap2.dat 0.000111161259827 0.0358869210331 0.321_rsrc:15_phi:0.5_abr:1_prof:gauss_diff:lap2.dat 0.000134981044857 0.0457899948612 0.332_rsrc:15_phi:0.5_abr:1_prof:gauss_diff:lap2.dat 2.79712100925e-05 0.0049473335673 0.313_rsrc:15_phi:0.5_abr:1_prof:gauss_diff:lap2.dat 3.11625097814e-05 0.00588538959351 0.312_rsrc:15_phi:0.5_abr:1_prof:gauss_diff:lap2.dat 3.69066495111e-05 0.00819208397496 0.331_rsrc:15_phi:0.5_abr:1_prof:gauss_diff:lap2.dat 3.69774104969e-05 0.00824956236819 0.311_rsrc:15_phi:0.5_abr:1_prof:gauss_diff:lap2.dat 6.15395637079e-05 0.0173808578728 0.321_rsrc:15_phi:0.5_abr:1_prof:gauss_diff:lap4.dat 0.000138353320007 1.05986015585 0.322_rsrc:15_phi:0.5_abr:1_prof:gauss_diff:lap4.dat 0.00017460061705 0.521775402243 0.311_rsrc:15_phi:0.5_abr:1_prof:gauss_diff:lap4.dat 0.000206502239096 0.149912367819 0.3_rsrc:15_phi:0.5_abr:1_prof:gauss_diff:lap4.dat 0.000237775594814 0.633350656766 0.332_rsrc:15_phi:0.5_abr:1_prof:gauss_diff:lap4.dat 3.1779126554e-05 0.0128586399133 0.313_rsrc:15_phi:0.5_abr:1_prof:gauss_diff:lap4.dat 4.33297503265e-05 0.0166438194725 0.312_rsrc:15_phi:0.5_abr:1_prof:gauss_diff:lap4.dat 7.21521358641e-05 0.0342760190842 0.331_rsrc:15_phi:0.5_abr:1_prof:gauss_diff:lap4.dat 7.52883193115e-05 0.0416052108611 ... 0.3_rsrc:8_phi:0.5_abr:2_prof:plaw_diff:point.dat 0.000124446390455 0.00132402479772 0.3_rsrc:8_phi:0.5_abr:2_prof:unif_diff:lap2.dat 1.2638050496e-05 0.0289450596111 0.3_rsrc:8_phi:0.5_abr:2_prof:unif_diff:lap4.dat 0.000100909900236 0.170116521056 0.3_rsrc:8_phi:0.5_abr:2_prof:unif_diff:point.dat 0.000237686616486 0.00142895807647 First key is read correctly (all abr:2s are at the end). Second key is also read correctly (diff:lap2s are before diff:lap4s). The last key -k1,1g is not read properly. According to the another SO question it should use only the first column (0.322, 0.3, etc.) with general numeric sort. Which is not performed (0.322>0.3 in lap2 sector) and unfortunately in lap4 sector the ordering is completely different. Command echo -e '0.3\n0.32\n0.28' | sort -g give correct result. Is it possible to change field separator -t for each sorting key -k?
-k2 uses all the characters from the beginning of the 2nd field to the end of the line, because you did not specify where the key ends. So the lines 0.322_rsrc:15_phi:0.5_abr:1_prof:gauss_diff:lap2.dat 0.000110687417806 0.0346076270248 0.3_rsrc:15_phi:0.5_abr:1_prof:gauss_diff:lap2.dat 0.000111161259827 0.0358869210331 are correctly sorted because in both keys begin with _rsrc:15 and 0.000110 sorts before 0.000111. The key phrase in the manual page is KEYDEF is F[.C][OPTS][,F[.C][OPTS]] for start and stop position, where F is a field number and C a character position in the field; both are origin 1, and the stop position defaults to the line's end.
move a field to the left and up wherever there is a blank space in a file using shell in linux
I am new to shell scripting. I am facing a problem while modifying a file. The file contains numeric as well as alphabetical values: eg QWEE123 1.18E+00 1.28E+00 1.22E+00 1.78E+01 1.77E+01 1.28E+00 1.18E+00 1.28E+00 1.22E+00 6.91E-01 4.20E+00 2.80E+00 7.06E-01 1.92E+00 REE234 3.18E+00 8.28E+00 9.22E+00 8.78E+01 3.77E+01 4.28E+00 7.18E+00 1.28E+00 5.91E-01 6.20E+00 4.80E+00 6.06E-01 4.18E+00 6.28E+00 2.22E+00 3.78E+01 7.77E+01 The out put I am looking for QWEE123 1.18E+00 1.28E+00 1.22E+00 1.78E+01 1.77E+01 1.28E+00 1.18E+00 1.28E+00 1.22E+00 6.91E-01 4.20E+00 2.80E+00 7.06E-01 1.92E+00 REE234 3.18E+00 8.28E+00 9.22E+00 8.78E+01 3.77E+01 4.28E+00 7.18E+00 1.28E+00 5.91E-01 6.20E+00 4.80E+00 6.06E-01 4.18E+00 6.28E+00 2.22E+00 3.78E+01 7.77E+01 The coulmns under QWEE123 and REE234 have 5 field values. I want to shift the values from the lines below to fill up the empty fields and make the rows with 5 field values. In the process the number of lines can change. I want to do it for for the values below QWEE123, REE234 and so on ( there are 2000 more similar to QWEE and REE and there are values below these). How can i do it using shell script ?
This makes several assumptions about your input, but perhaps gives your what you want: awk '/^[^0-9]/{printf("\n%s\n",$0); n=0} /^[0-9]/{for (i=1;i<=NF;i++) { printf("%s%s",$i, ++n%5 ?" ":"\n")}}' input
Split a string containing fixed length columns
I got data like this: 3LLO24MACT01 24MOB_6012010051700000020100510105010 123456 It contains different values for different columns when I import it. Every column is fixed width: Col#1 is the ID and just 1 long. Meaning it is "3" here. Col#2 is 3 in length and here "LLO". Col#3 is 9 in length and "24MACT01 " (notice that the missing ones gets filled up by blanks). This goes on for 15 columns or so... Is there a method to quickly cut it into different elements based on sequence length? I couldn't find any.
This can be done with RegEx matching, and creating an array of custom objects. Something like this: $AllRecords = Get-Content C:\Path\To\File.txt | Where{$_ -match "^(.)(.{3})(.{9})"} | ForEach{ [PSCustomObject]#{ 'Col1' = $Matches[1] 'Col2' = $Matches[2] 'Col3' = $Matches[3] } } That will take each line, match by how many characters are specified, and then create an object based off those matches. It collects all objects in an array and could be exported to CSV or whatever. The 'Col1', 'Col2' etc are just generic column headers I suggested due to a lack of better information, and could be anything you wanted. Edit: Thank you iCodez for showing me, perhaps inadvertantly, that you can specify a language for your code samples!
[Regex]::Matches will do this rather easily. All you need to do is specify a Regex pattern that has . followed by the number of characters you want in curly braces. For example, to match a column of three characters, you would write .{3}. You then do this for all 15 columns. To demonstrate, I will use a string that contains the first three columns of your example data (since I know their sizes): PS > $data = '3LLO24MACT01 ' PS > $pattern = '(.{1})(.{3})(.{9})' PS > ([Regex]::Matches($data, $pattern).Groups).Value 3LLO24MACT01 3 LLO 24MACT01 PS > Note that the first value outputted will be the text matched be all of the capture groups. If you do not need this, you can remove it with slicing: $columns = ([Regex]::Matches($data, $pattern).Groups).Value $columns = $columns[1..$columns.Length]
New-PSObjectFromMatches is a helper function for creating PS Objects from regex matches. The -Debug option can help with the process of writing the regex.