adding double quotes, commas and removing newlines

adding double quotes, commas and removing newlines - linux

I have a file that have a list of integers:
12542
58696
78845
87855
...
I want to change them into:
"12542", "58696", "78845", "87855", "..."
(no comma at the end)
I believe I need to use sed but couldnt figure it out how. Appreciate your help.

You could do a sed multiline trick, but the easy way is to take advantage of shell expansion:
echo $(sed '$ ! s/.*/"&",/; $ s/.*/"&"/' foo.txt)
Run echo $(cat file) to see why this works. The trick, in a nutshell, is that the result of cat is parsed into tokens and interpreted as individual arguments to echo, which prints them separated by spaces.
The sed expression reads
$ ! s/.*/"&",/
$ s/.*/"&"/
...which means: For all but the last line ($ !) replace the line with "line",, and for the last line, with "line".
EDIT: In the event that the file contains not just a line of integers like in OP's case (when the file can contain characters the shell expands), the following works:
EDIT2: Nicer code for the general case.
sed -n 's/.*/"&"/; $! s/$/,/; 1 h; 1 ! H; $ { x; s/\n/ /g; p; }' foo.txt
Explanation: Written in a more readable fashion, the sed script is
s/.*/"&"/
$! s/$/,/
1 h
1! H
$ {
x
s/\n/ /g
p
}
What this means is:
s/.*/"&"/
Wrap every line in double quotes.
$! s/$/,/
If it isn't the last line, append a comma
1 h
1! H
If it is the first line, overwrite the hold buffer with the result of the previous transformation(s), otherwise append it to the hold buffer.
$ {
x
s/\n/ /g
p
}
If it is the last line -- at this point the hold buffer contains the whole line wrapped in double quotes with commas where appropriate -- swap the hold buffer with the pattern space, replace newlines with spaces, and print the result.

Here is the solution,
sed 's/.*/ "&"/' input-file|tr '\n' ','|rev | cut -c 2- | rev|sed 's/^.//'
First change your input text line in quotes
sed 's/.*/ "&"/' input-file
Then, this will convert your new line to commas
tr '\n' ',' <your-inputfile>
The last commands including rev, cut and sed are used for formatting the output according to requirement.
Where,
rev is reversing string.
cut is removing trailing comma from output.
sed is removing the first character in the string to formatting it accordingly.
Output:

With perl without any pipes/forks :
perl -0ne 'print join(", ", map { "\042$_\042" } split), "\n"' file
OUTPUT:
"12542", "58696", "78845", "87855"

Here's a pure Bash (Bash≥4) possibility that reads the whole file in memory, so it won't be good for huge files:
mapfile -t ary < file
((${#ary[#]})) && printf '"%s"' "${ary[0]}"
((${#ary[#]}>1)) && printf ', "%s"' "${ary[#]:1}"
printf '\n'
For huge files, this awk seems ok (and will be rather fast):
awk '{if(NR>1) printf ", ";printf("\"%s\"",$0)} END {print ""}' file

One way, using sed:
sed ':a; N; $!ba; s/\n/", "/g; s/.*/"&"/' file
Results:
"12542", "58696", "78845", "87855", "..."

You can write the column oriented values in a row with no comma following the last as follows:
cnt=0
while read -r line || test -n "$line" ; do
[ "$cnt" = "0" ] && printf "\"%s\"" "$line"
printf ", \"%s\"" "$line"
cnt=$((cnt + 1))
done
printf "\n"
output:
$ bash col2row.sh dat/ncol.txt
"12542", "12542", "58696", "78845", "87855"

A simplified awk solution:
awk '{ printf sep "\"%s\"", $0; sep=", " }' file
Takes advantage of uninitialized variables defaulting to an empty string in a string context (sep).
sep "\"%s\"" synthesizes the format string to use with printf by concatenating sep with \"%s\". The resulting format string is applied to $0, each input line.
Since sep is only initialized after the first input record, , is effectively only inserted between output elements.

Related

echo without trimming the space in awk command

I have a file consisting of multiple rows like this
10|EQU000000001|12345678|3456||EOMCO042|EOMCO042|31DEC2018|16:51:17|31DEC2018|SHOP NO.5,6,7 RUNWAL GRCHEMBUR MHIN|0000000010000.00|6761857316|508998|6011|GL
I have to split and replace the column 11 into 4 different columns using the count of character.
This is the 11th column containing extra spaces also.
SHOP NO.5,6,7 RUNWAL GRCHEMBUR MHIN
This is I have done
ls *.txt *.TXT| while read line
do
subName="$(cut -d'.' -f1 <<<"$line")"
awk -F"|" '{ "echo -n "$11" | cut -c1-23" | getline ton;
"echo -n "$11" | cut -c24-36" | getline city;
"echo -n "$11" | cut -c37-38" | getline state;
"echo -n "$11" | cut -c39-40" | getline country;
$11=ton"|"city"|"state"|"country; print $0
}' OFS="|" $line > $subName$output
done
But while doing echo of 11th column, its trimming the extra spaces which leads to mismatch in count of character. Is there any way to echo without trimming spaces ?
Actual output
10|EQU000000001|12345678|3456||EOMCO042|EOMCO042|31DEC2018|16:51:17|31DEC2018|SHOP NO.5,6,7 RUNWAL GR|CHEMBUR MHIN|||0000000010000.00|6761857316|508998|6011|GL
Expected Output
10|EQU000000001|12345678|3456||EOMCO042|EOMCO042|31DEC2018|16:51:17|31DEC2018|SHOP NO.5,6,7 RUNWAL GR|CHEMBUR|MH|IN|0000000010000.00|6761857316|508998|6011|GL

The least annoying way to code this that I've found so far is:
perl -F'\|' -lane '$F[10] = join "|", unpack "a23 A13 a2 a2", $F[10]; print join "|", #F'
It's fairly straightforward:
Iterate over lines of input; split each line on | and put the fields in #F.
For the 11th field ($F[10]), split it into fixed-width subfields using unpack (and trim trailing spaces from the second field (A instead of a)).
Reassemble subfields by joining with |.
Reassemble the whole line by joining with | and printing it.
I haven't benchmarked it in any way, but it's likely much faster than the original code that spawns multiple shell and cut processes per input line because it's all done in one process.
A complete solution would wrap it in a shell loop:
for file in *.txt *.TXT; do
outfile="${file%.*}$output"
perl -F'\|' -lane '...' "$file" > "$outfile"
done
Or if you don't need to trim the .txt part (and you don't have too many files to fit on the command line):
perl -i.out -F'\|' -lane '...' *.txt *.TXT
This simply places the output for each input file foo.txt in foo.txt.out.

A pure-bash implementation of all this logic
#!/usr/bin/env bash
shopt -s nocaseglob extglob
for f in *.txt; do
subName=${f%.*}
while IFS='|' read -r -a fields; do
location=${fields[10]}
ton=${location:0:23}; ton=${ton%%+([[:space:]])}
city=${location:23:12}; city=${city%%+([[:space:]])}
state=${location:36:2}
country=${location:38:2}
fields[10]="$ton|$city|$state|$country"
printf -v out '%s|' "${fields[#]}"
printf '%s\n' "${out:0:$(( ${#out} - 1 ))}"
done <"$f" >"$subName.out"
done
It's slower (if I did this well, by about a factor of 10) than pure awk would be, but much faster than the awk/shell combination proposed in the question.
Going into the constructs used:
All the ${varname%...} and related constructs are parameter expansion. The specific ${varname%pattern} construct removes the shortest possible match for pattern from the value in varname, or the longest match if % is replaced with %%.
Using extglob enables extended globbing syntax, such as +([[:space:]]), which is equivalent to the regex syntax [[:space:]]+.

Change case of first word of each line

From command line, how to change to uppercase each first word of a line in a text file?
Example input:
hello world
tell me who you are!
Example output:
HELLO world
TELL me who you are!
There are no empty lines, it's ASCII, and each line starts with an alphabetic word followed by a tab.
Tools to use: anything that works on command line on macOS (bash 3.2, BSD sed, awk, tr, perl 5, python 2.7, swift 4, etc.).

You can always just use bash case conversion and a while loop to accomplish what you intend, e.g.
$ while read -r a b; do echo "${a^^} $b"; done < file
HELLO world
HOW are you?
The parameter expansion ${var^^} converts all chars in var to uppercase, ${var^} converts the first letter.
Bash 3.2 - 'tr'
For earlier bash, you can use the same setup with tr with a herestring to handle the case conversion:
$ while read -r a b; do echo "$(tr [a-z] [A-Z] <<<"$a") $b"; done file
HELLO world
HOW are you?
Preserving \t Characters
To preserve the tab separated words, you have to prevent word-splitting during the read. Unfortunately, the -d option to read doesn't allow termination on a set of characters. A way around checking for both spaces or tab delimited words is the read the entire line disabling word-splitting with IFS= and then scanning forward through the line until the first literal $' ' or $'\t' is found. (the literals are bash-only, not POSIX shell) A simple implementation would be:
while IFS= read -r line; do
word=
ct=0
for ((i = 0; i < ${#line}; i++)); do
ct=$i
## check against literal 'space' or 'tab'
[ "${line:$i:1}" = $' ' -o "${line:$i:1}" = $'\t' ] && break
word="${word}${line:$i:1}"
done
word="$(tr [a-z] [A-Z] <<<"$word")"
echo "${word}${line:$((ct))}"
done <file
Output of tab Separated Words
HELLO world
HOW are you?

Use awk one-liner:
awk -F$'\t' -v OFS=$'\t' '{ $1 = toupper($1) }1' file

Using GNU sed:
sed 's/^\S*/\U&/g' file
where \S matches a non-whitespace character and \U& uppercases the matched pattern
UPDATE: in case of BSD sed since it does not support most of those special characters it is still doable but requires a much longer expression
sed -f script file
where the script contains
{
h
s/ .*//
y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
G
s/\(.*\)\n[^ ]* \(.*\)/\1 \2/
}

search a line that contain a special character using sed or awk

I wonder if there is a command in Linux that can help me to find a line that begins with "*" and contains the special character "|"
for example
* Date | Auteurs

Simply use:
grep -ne '^\*.*|' "${filename}"
Or if you want to use sed:
sed -n '/^\*.*|/{=;p}' "${filename}" | sed '{N;s/\n/:/}'
Or (gnu) awk equivalent (require to backslash the pipe):
awk '/^\*.*\|/' "${filename}"
Where:
^ : start of the line
\*: a literal *
.*: zero or more generic char (not newline)
| : a literal pipe
NB: "${filename}": i've assumed you're using the command in a script with the target file passed in a double quoted variable as "${filename}". In the shell simply use the actual name of the file (or the path to it).
UPDATE (line numbers)
Modify the above commands to obtain also the line number of the matched lines. With grep is simple as to add -n switch:
grep -ne '^\*.*|' "${filename}"
We obtain an output like this:
81806:* Date | Auteurs
To obtain exactly the same output from sed and awk we have to complicate the commands a little bit:
awk '/^\*.*\|/{print NR ":" $0}' "${filename}"
# the = print the line number, p the actual match but it's on two different lines so the second sed call
sed -n '/^\*.*|/{=;p}' "${filename}" | sed '{N;s/\n/:/}'

Split string at special character in bash

I'm reading filenames from a textfile line by line in a bash script. However the the lines look like this:
/path/to/myfile1.txt 1
/path/to/myfile2.txt 2
/path/to/myfile3.txt 3
...
/path/to/myfile20.txt 20
So there is a second column containing an integer number speparated by space. I only need the part of the string before the space.
I found only solutions using a "for-loop". But I need a function that explicitly looks for the " "-character (space) in my string and splits it at that point.
In principle I need the equivalent to Matlabs "strsplit(str,delimiter)"

If you are already reading the file with something like
while read -r line; do
(and you should be), then pass two arguments to read instead:
while read -r filename somenumber; do
read will split the line on whitespace and assign the first field to filename and any remaining field(s) to somenumber.

Three (of many) solutions:
# Using awk
echo "$string" | awk '{ print $1 }'
# Using cut
echo "$string" | cut -d' ' -f1
# Using sed
echo "$string" | sed 's/\s.*$//g'

If you need to iterate trough each line of the file anyways, you can cut off everything behind the space with bash:
while read -r line ; do
# bash string manipulation removes the space at the end
# and everything which follows it
echo ${line// *}
done < file

This should work too:
line="${line% *}"
This cuts the string at it's last occurrence (from left) of a space. So it will work even if the path contains spaces (as long as it follows by a space at end).

while read -r line
do
{ rev | cut -d' ' -f2- | rev >> result.txt; } <<< $line
done < input.txt
This solution will work even if you have spaces in your filenames.

linux shell title case

I am wrinting a shell script and have a variable like this: something-that-is-hyphenated.
I need to use it in various points in the script as:
something-that-is-hyphenated, somethingthatishyphenated, SomethingThatIsHyphenated
I have managed to change it to somethingthatishyphenated by stripping out - using sed "s/-//g".
I am sure there is a simpler way, and also, need to know how to get the camel cased version.
Edit: Working function derived from #Michał's answer
function hyphenToCamel {
tr '-' '\n' | awk '{printf "%s%s", toupper(substr($0,1,1)), substr($0,2)}'
}
CAMEL=$(echo something-that-is-hyphenated | hyphenToCamel)
echo $CAMEL
Edit: Finally, a sed one liner thanks to #glenn
echo a-hyphenated-string | sed -E "s/(^|-)([a-z])/\u\2/g"

a GNU sed one-liner
echo something-that-is-hyphenated |
sed -e 's/-\([a-z]\)/\u\1/g' -e 's/^[a-z]/\u&/'
\u in the replacement string is documented in the sed manual.

Pure bashism:
var0=something-that-is-hyphenated
var1=(${var0//-/ })
var2=${var1[*]^}
var3=${var2// /}
echo $var3
SomethingThatIsHyphenated
Line 1 is trivial.
Line 2 is the bashism for replaceAll or 's/-/ /g', wrapped in parens, to build an array.
Line 3 uses ${foo^}, which means uppercase (while ${foo,} would mean 'lowercase' [note, how ^ points up while , points down]) but to operate on every first letter of a word, we address the whole array with ${foo[*]} (or ${foo[#]}, if you would prefer that).
Line 4 is again a replace-all: blank with nothing.
Line 5 is trivial again.

You can define a function:
hypenToCamel() {
tr '-' '\n' | awk '{printf "%s%s", toupper(substr($0,0,1)), substr($0,2)}'
}
CAMEL=$(echo something-that-is-hyphenated | hypenToCamel)
echo $CAMEL

In the shell you are stuck with being messy:
aa="aaa-aaa-bbb-bbb"
echo " $aa" | sed -e 's/--*/ /g' -e 's/ a/A/g' -e 's/ b/B/g' ... -e 's/ *//g'
Note the carefully placed space in the echo and the double space in the last -e.
I leave it as an exercise to complete the code.
In perl it is a bit easier as a one-line shell command:
perl -e 'print map{ $a = ucfirst; $a =~ s/ +//g; $a} split( /-+/, $ARGV[0] ), "\n"' $aa

For the records, here's a pure Bash safe method (that is not subject to pathname expansion)—using Bash≥4:
var0=something-that-is-hyphenated
IFS=- read -r -d '' -a var1 < <(printf '%s\0' "${var0,,}")
printf '%s' "${var1[#]^}"
This (safely) splits the lowercase expansion of var0 at the hyphens, with each split part in array var1. Then we use the ^ parameter expansion to uppercase the first character of the fields of this array, and concatenate them.
If your variable may also contain spaces and you want to act on them too, change IFS=- into IFS='- '.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

adding double quotes, commas and removing newlines - linux

I have a file that have a list of integers: 12542 58696 78845 87855 ... I want to change them into: "12542", "58696", "78845", "87855", "..." (no comma at the end) I believe I need to use sed but couldnt figure it out how. Appreciate your help.

With perl without any pipes/forks : perl -0ne 'print join(", ", map { "\042$_\042" } split), "\n"' file OUTPUT: "12542", "58696", "78845", "87855"

One way, using sed: sed ':a; N; $!ba; s/\n/", "/g; s/.*/"&"/' file Results: "12542", "58696", "78845", "87855", "..."

Related

echo without trimming the space in awk command

Change case of first word of each line

search a line that contain a special character using sed or awk

Split string at special character in bash

linux shell title case

Categories

Resources