Aligning lines in text file - linux

I have data in the text file as below
E993143|65282
C960954567|50222
P1_ABCDEFG_bbb|26153
A960416|25654
D987747|13410
I would like to have in a proper alignment using linux as below
E993143 |65282
C960954567 |50222
P1_ABCDEFG_bbb |26153
A960416 |25654
D987747 |13410
Can somebody help me here?
Note: I cannot use excel format

You can use awk as follows over your text:
awk -F"|" '{printf("%-15s \t |%-10i\n", $1, $2)}'
In this, I have fixed the max-length for 1st column as '15' and second as '10'. You can change these numbers if you are expecting a larger length.
Explanation:
"-F" flag defines the delimiter as "|"
"%-15s \t |%-10i\n" - this section is defining how we want the output string to be formatted. ' - ' in '-15s' is for left alignment of the output column and '15s' is for 15 character length string. Similarly '-10i' is for 10 digit integer value. "\t" and "\n" is to add tab space in between and line space at the end.
Output:
➜ test cat test.txt | awk -F"|" '{printf("%-15s \t |%-10i\n", $1, $2)}'
E993143 |65282
C960954567 |50222
P1_ABCDEFG_bbb |26153
A960416 |25654
D987747 |13410
➜ test

the tool "column" might help you : https://www.stefaanlippens.net/pretty-csv.html. like
cat test.csv | column -t -s \| -o \|
"|" needs escaping when used as parameter-input

Related

Convert floating point numbers to user defined output using AWK

I am trying to convert floating point numbers (columns) from a text file to the user defined output using awk, e-01 -> $\exp 10^{-01}$
Test input:
1.2e-01
1.8e-02
1.12e-03
1.222e+04
1.23e+05
441.2e+05
221.2e+06
Expect results
1.2$\exp 10^{-01}$
1.8$\exp 10^{-02}$
1.12$\exp 10^{-03}$
1.222$\exp 10^{+04}$
1.23$\exp 10^{+05}$
441.2$\exp 10^{+05}$
221.2$\exp 10^{+06}$
I have used the following command "awk '{printf "%.4e\n", $1}'", which does not solve this problem.
Any help would be really appreciated.
You may use this simple sed substitution with a capturing group and a back-reference:
sed -E 's/e([+-][0-9]+)/$\\exp 10^{\1}$/' file
1.2$\exp 10^{-01}$
1.8$\exp 10^{-02}$
1.12$\exp 10^{-03}$
1.222$\exp 10^{+04}$
1.23$\exp 10^{+05}$
441.2$\exp 10^{+05}$
221.2$\exp 10^{+06}$
Could you please try following, written and tested with shown samples only in GNU awk.
awk '{sub(/ +$/,"");sub(/e/,"$\\exp ");sub(/[-+]/,"10^{&");$0=$0"}$"} 1' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
{
sub(/ +$/,"") ##Substituting space at last of line with NULL in each line.
sub(/e/,"$\\exp ") ##Substituting e with $\\exp in current line.
sub(/[-+]/,"10^{&") ##Substituting either - or + with 10^{ with matched - or +
$0=$0"}$" ##Appending }$ at current line.
}
1 ##1 will print current line.
' Input_file ##Mentioning Input_file name here.
I would treat input as text and do two subsequent replacements, namely:
awk '{$0=gensub("e", "$\\\\exp 10^", 1); $0=gensub("(-|+)([0-9]+)[[:blank:]]+", "{\\1\\2}$", 1); print}' file.txt
Let file.txt be:
1.2e-01
1.8e-02
1.12e-03
1.222e+04
1.23e+05
441.2e+05
221.2e+06
then output is:
1.2$\exp 10^{-01}$
1.8$\exp 10^{-02}$
1.12$\exp 10^{-03}$
1.222$\exp 10^{+04}$
1.23$\exp 10^{+05}$
441.2$\exp 10^{+05}$
221.2$\exp 10^{+06}$
Explanation: I alter whole line ($0), firstly I replace e with $\exp 10^ (\ needs to be escaped), secondly I search for sign (- or +) followed by (one or more digits) followed by one or more space or tab, which I replace with {signdigits}$. Finally I print altered line.

How to convert an uneven tab separated file using sed?

How to convert an uneven TAB separated input file to CSV or PSV using sed command?
28828082-1 04/08/19 08:48 04/11/19 12:37 04/12/19 16:22 4/15-4/16 04/17/19 2 9 LCO W OIP 04/08/19 08:53 21 1 58.00 9 222 79 FEDX FEDXH SL3 484657064673 0410099900691041119 SMITHFIELD RI 02917 "41.890066 , -71.548680" YES
Above is 1 row, I tried using sed -r 's/^\s+//;s/\s+/|/g' but the result was not as expected.
gawk to the rescue!
$ awk -vFPAT='([^[:space:]]+)|("[^"]+")' -v OFS='|' '$1=$1' file
28828082-1|04/08/19|08:48|04/11/19|12:37|04/12/19|16:22|4/15-4/16|04/17/19|2|9|LCO|W|OIP|04/08/19|08:53|21|1|58.00|9|222|79|FEDX|FEDXH|SL3|484657064673|0410099900691041119|SMITHFIELD|RI|02917|"41.890066 , -71.548680"|YES
define the field pattern as non space or a quoted value which might include spaces (but not escaped quotes), replace the output field separated with tab, force the line to be parsed and non zero lines will be printed after format change.
A better version would be ... '{$1=$1; print}'.
Of course, if all the field delimiters are tabs and quotes string doesn't include any tabs, it's much simpler.
Your question isn't clear but is this what you're trying to do?
$ printf 'now\t"is the winter"\tof\t"our discontent"\n' > file
$ cat file
now "is the winter" of "our discontent"
$ tr '\t' ',' < file
now,"is the winter",of,"our discontent"
$ tr '\t' '|' < file
now|"is the winter"|of|"our discontent"
You initial answer was very close:
sed 's/[[:space:]]\+/|/g' input.txt
Explanation:
[[:space:]] Match a single whitespace character such as space/tab/CR/newline.
\+ Match one or more of the current grab.
Update:
If you require 2 or more white spaces.
sed 's/[[:space:]]\{2,\}/|/g' input.txt
\{2,\} Match two or more of the current grab.

Trim a string up to 4th delimiter from right side

I have strings like following which should be parsed with only unix command (bash)
49_sftp_mac_myfile_simul_test_9999_4000000000000001_2017-02-06_15-15-26.49.csv.failed
I want to trim the strings like above upto 4th underscore from end/right side. So output should be
49_sftp_mac_myfile_simul_test
Number of underscores can vary in overall string. For example, The string could be
49_sftp_simul_test_9999_4000000000000001_2017-02-06_15-15-26.49.csv.failed
Output should be (after trimming up to 4th occurrence of underscore from right.
49_sftp_simul_test
Easily done using awk that decrements NF i.e. no. of fields to -4 after setting input+output field separator as underscore:
s='49_sftp_mac_myfile_simul_test_9999_4000000000000001_2017-02-06_15-15-26.49.csv.failed'
awk 'BEGIN{FS=OFS="_"} {NF -= 4; $1=$1} 1' <<< "$s"
49_sftp_mac_myfile_simul_test
You can use bash's parameter expansion for that:
string="..."
echo "${string%_*_*_*_*}"
With GNU sed:
$ sed -E 's/(_[^_]*){4}$//' <<< "49_sftp_mac_myfile_simul_test_9999_4000000000000001_2017-02-06_15-15-26.49.csv.failed"
49_sftp_mac_myfile_simul_test
From the end of line, removes 4 occurrences of _ followed by non _ characters.
Perl one-liner
echo $your-string | perl -lne '$n++ while /_/g; print join "_",((split/_/)[-$n-1..-5])'
input
49_sftp_mac_myfile_simul_test_9999_4000000000000001_2017-02-06_15-15-26.49.csv.failed
the output
49_sftp_mac_myfile_simul_test
input
49_sftp_simul_test_9999_4000000000000001_2017-02-06_15-15-26.49.csv.failed
the output
49_sftp_simul_test
Not the fastest but maybe the easiest to remember and funiest:
echo "49_sftp_mac_myfile_simul_test_9999_4000000000000001_2017-02-06_15-15-26.49.csv.failed"|
rev | cut -d"_" -f5- | rev

How to filter a the required content from a string in linux?

I had a string like:-
sometext sometext BASEDIR=/someword/someword/someword/1342.32 sometext sometext.
Could someone tell me, how to filter this number 1342.32, from the above string in linux??
$ echo "sometext BASEDIR=/someword/1342.32 sometext." |
sed "s/[^0-9.]//g"
> 1342.32.
The sed command searches for anything not in the set "0123456789" or ".", and replaces it with nothing (deletes it). It does this in global mode, so it doesn't stop on the first match.
This is enough if you're just trying to read it. If you're trying to feed the number into another command and need a real number, you will need to clean it up:
$ ... | cut -f 1-2 -d "."
> 1342.32
cut splits the input on the delemiter, then selects fields 1 and 2 (numbered from one). So "1.2.3.4" would return "1.2".
If sometext is always delimited from the surrounding fields by a white space, try this
cat log.txt | awk '{for (i=1;i<=NF;i++) {if ($i ~
/BASEDIR/) {print i,$i}}}' | awk -F/ '{for (i=1;i<=NF;i++) {if ($i ~
/^[0-9][0-9]*$/) {print $i}}}'
The code snippet above assumes that your data is contained in a file called log.txt and organised in records(read this awk-wise)
This works also if digits appear in sometext before BASEDIR as well as if the input has additional lines:
sed -n 's,.*BASEDIR=\(/\w*\)*/\([0-9.]*\).*,\2,p'
-n do not output lines without BASEDIR…
\(/\w*\)* group of / and someword, repeated
\([0-9.]*\) group of repeated digit or decimal point
\2 replacement of everything matched (the entire line) with the 2nd group
p print the result

Getting n-th line of text output

I have a script that generates two lines as output each time. I'm really just interested in the second line. Moreover I'm only interested in the text that appears between a pair of #'s on the second line. Additionally, between the hashes, another delimiter is used: ^A. It would be great if I can also break apart each part of text that is ^A-delimited (Note that ^A is SOH special character and can be typed by using Ctrl-A)
output | sed -n '1p' #prints the 1st line of output
output | sed -n '1,3p' #prints the 1st, 2nd and 3rd line of output
your.program | tail +2 | cut -d# -f2
should get you 2/3 of the way.
Improving Grumdrig's answer:
your.program | head -n 2| tail -1 | cut -d# -f2
I'd probably use awk for that.
your_script | awk -F# 'NR == 2 && NF == 3 {
num_tokens=split($2, tokens, "^A")
for (i = 1; i <= num_tokens; ++i) {
print tokens[i]
}
}'
This says
1. Set the field separator to #
2. On lines that are the 2nd line, and also have 3 fields (text#text#text)
3. Split the middle (2nd) field using "^A" as the delimiter into the array named tokens
4. Print each token
Obviously this makes a lot of assumptions. You might need to tweak it if, for example, # or ^A can appear legitimately in the data, without being separators. But something like that should get you started. You might need to use nawk or gawk or something, I'm not entirely sure if plain awk can handle splitting on a control character.
bash:
read
read line
result="${line#*#}"
result="${result%#*}"
IFS=$'\001' read result -a <<< "$result"
$result is now an array that contains the elements you're interested in. Just pipe the output of the script to this one.
here's a possible awk solution
awk -F"#" 'NR==2{
for(i=2;i<=NF;i+=2){
split($i,a,"\001") # split on SOH
for(o in a ) print o # print the splitted hash
}
}' file

Resources