using sed to change numbers in a csv file - linux

I have a csv file with 3 columns like below
Jones Smith 656220665
I would like to convert it to
Jones Smith 000000000
The problem i have is not all the numbers are the same length. some are 7 digits long. i can't seem to find a way to change them from their current format to 0,s and has to use sed and cut
Here is 2 of the codes i tried and tried to manipulate to suit my needs
sed 's/\([^ ]*\) \([^_]*\)_\(.*\)/\1 \3/g' Input_file
and
$ sed 's/\(\([^,]\+,\)\{1\}\)\([^,]\+,\)\(.*\)/\1\3\3\4/' /path/to/your/file

Instead of using sed, how about this:
echo 'Jones Smith 656220665' | tr '[0-9]' '0'
Jones Smith 000000000
For the whole file that's then:
tr '[0-9]' '0' < file > file.tmp
Edit 1:
added a sed solution:
sed 's/[0-9]/0/g' smith
Jones Smith 000000000
stil
Edit 2:
cat ClientData.csv > ClientData.csv.bak
sed 's/[0-9]/0/g; w ClientData.csv' ClientData.csv.bak | cut -d" " -f 1-2

You can do this very simply with sed general substitution of:
sed 's/[0-9]/0/g`
(where the 'g' provides a global replacement of all instances of [0-9] with '0'), e.g.
$ echo "Jones Smith 656220665" | sed 's/[0-9]/0/g'
Jones Smith 000000000
Give it a shot and let me know if you have further issues.

Related

Use values in a column to separate strings in another column in bash

I am trying to separate a column of strings using the values from another column, maybe an example will be easier for you to understand.
The input is a table, with strings in column 2 separated with a comma ,.
The third column is the field number that should be outputted, with , as the delimited in the second column.
Ben mango,apple 1
Mary apple,orange,grape 2
Sam apple,melon,* 3
Peter melon 1
The output should look like this, where records that correspond to an asterisk should not be outputted (the Sam row is not outputted):
Ben mango
Mary orange
Peter melon
I am able to generate the desired output using a for loop, but I think it is quite cumbersome:
IFS=$'\n'
for i in $(cat input.txt)
do
F=`echo $i | cut -f3`
paste <(echo $i | cut -f1) <(echo $i | cut -f2 | cut -d "," -f$F) | grep -v "\*"
done
Is there any one-liner to do it maybe using sed or awk? Thanks in advance.
The key to doing it in awk is the split() function, which populates an array based on a regular expression that matches the delimiters to split a string on:
$ awk '{ split($2, fruits, /,/); if (fruits[$3] != "*") print $1, fruits[$3] }' input.txt
Ben mango
Mary orange
Peter melon

Randomly selecting (units) from a file where a unit is 2 lines.

I want to select from a file random lines/units but where the units are consisted of 2 lines.
For example a file looks like this
Adam
Apple
Mindy
Candy
Steve
Chips
David
Meat
Carol
Carrots
And I want to randomly subselect lets say 2 units group
For example
Adam
Apple
David
Meat
or
Steve
Chips
Carol
Carrots
I've tried using shuf and sort -R but they only shuffle 1 lines. Could someone help me please?
Thank you.
You could do it with shuf by joining the lines before shuffling (that might not be a bad idea for a file format in general, if the lines describe a single item):
$ < file sed -e 'N;s/\n/:/' | shuf | head -1 | tr ':' '\n'
Carol
Carrots
The sed loads two lines at a time, and joins them with a colon.
Pick a random number in the correct range, ensure that it is odd (if desired), then use sed to print the 2 lines:
$ a=$(expr $RANDOM % \( $(wc -l < input) / 2 \) \* 2 + 1)
$ sed -n -e ${a}p -e $((a+1))p input
Rather than selecting lines to print, you could walk the file and print each "unit" with a particular probability. For example, to print (roughly) 10% of the "units" in the file, you could do:
awk 'BEGIN{srand()} NR%2 && (rand() < .1) {print; getline; print}' input

Reformat item with line feeds from a CSV file

I have a csv and I would like to know how to replace newline by -, just in the brothers column, with bash:
name,brothers,age,adress
------------------------
john,"marc
peter
paul
alex",18,street
thomas,mike,20,place
Awk is perfect for this
awk -v RS='^$' -v ORS= '{while ( match($0,/"[^"]+"/,a) ) {gsub(/\n/," ",a[0]); print substr($0,1,RSTART-1) a[0]; $0=substr($0,RSTART+RLENGTH)} print}' your.csv
outputs:
me,brothers,age,adress
------------------------
john,"marc peter paul alex",18,street
thomas,mike,20,place
Ungainly combo of csvtool, sed, & bash:
csvtool pastecol 2 1- \
input.csv
<(csvtool col 2 input.csv | \
sed -n '/"/,/"/{:a;N;$!ba;s/\([^"]\)\n/\1-/g;};p') | \
csvtool trim r -
Output:
name,brothers,age,adress
------------------------
john,marc-peter-paul-alex,18,street
thomas,mike,20,place
Except for the sed part, it's not that bad. csvtool replaces column 2 with an edited copy. At the end it trims an extra comma that csvtool stuck in there.

How to create a CSV file based on row in shell script?

I have a text file /tmp/some.txt with below values
JOHN YES 6 6 2345762
SHAUN NO 6 6 2345748
I want to create a csv file with below format (i.e based on rows. NOT based on columns).
JOHN,YES,6,6,2345762
SHAUN,NO,6,6,2345748
i tried below code
for i in `wc -l /tmp/some.txt | awk '{print $1}'`
do
awk 'NR==$i' /tmp/some.txt | awk '{print $1","$2","$3","$4","$5}' >> /tmp/some.csv
done
here wc -l /tmp/some.txt | awk '{print $1}' will get the value as 2 (i.e 2 rows in text file).
and for each row awk 'NR==$i' /tmp/some.txt | awk '{print $1","$2","$3","$4","$5}' will print the 5 fields into some.csvfile which is separated by comma.
when i execute each command separately it will work. but when i make it as a shell script i'm getting empty some.csv file.
#Kart: Could you please try following.
awk '{$1=$1;} 1' OFS=, Input_file > output.csv
I hope this helps you.
I suggest:
sed 's/[[:space:]]\+/,/g' /tmp/some.txt
You almost got it. awk already process the file row by row, so you don't need to iterate with the for loop.
So you just need to run:
awk '{print $1","$2","$3","$4","$5}' /tmp/some.txt >> /tmp/some.csv
With tr, squeezing (-s), and then transliterating space/tab ([:blank:]):
tr -s '[:blank:]' ',' <file.txt
With sed, substituting one or more space/tab with ,:
sed 's/[[:blank:]]\+/,/g' file.txt
With awk, replacing one ore more space/tab with , using gsub() function:
awk 'gsub("[[:blank:]]+", ",", $0)' file.txt
Example
% cat foo.txt
JOHN YES 6 6 2345762
SHAUN NO 6 6 2345748
% tr -s '[:blank:]' ',' <foo.txt
JOHN,YES,6,6,2345762
SHAUN,NO,6,6,2345748
% sed 's/[[:blank:]]\+/,/g' foo.txt
JOHN,YES,6,6,2345762
SHAUN,NO,6,6,2345748
% awk 'gsub("[[:blank:]]+", ",", $0)' foo.txt
JOHN,YES,6,6,2345762
SHAUN,NO,6,6,2345748

How to use cut and paste commands as a single line command?

In Unix, I am trying to write a sequence of cut and paste commands (saving result of each command in a file) that inverts every name in the file(below) shortlist and places a coma after the last name(for example, bill johnson becomes johnson, bill).
here is my file shortlist:
2233:charles harris :g.m. :sales :12/12/52: 90000
9876:bill johnson :director :production:03/12/50:130000
5678:robert dylan :d.g.m. :marketing :04/19/43: 85000
2365:john woodcock :director :personnel :05/11/47:120000
5423:barry wood :chairman :admin :08/30/56:160000
I am able to cut from shortlist but not sure how to paste it on to my filenew file in same command line. Here is my code for cut:
cut -d: -f2 shortlist
result:
charles harris
bill johnson
robert dylan
john woodcock
barry wood
Now I want this to be pasted in my filenew file and when I cat filenew, result should look like below,
harris, charles
johnson, bill
dylan, robert
woodcock, john
wood, barry
Please guide me through this. Thank you.
You could do it with a single awk:
awk -F: '{split($2,a, / /); if(a[2]) l=a[2] ", "; print l a[1]}' shortlist
I am assuming that if you don't have a second name, you don't want to print the comma (and you don't have more than 2 words in the name).
Once you've used cut to split up the string, it may be easier to use awk than paste to produce the result you want:
$ cut -d":" -f2 shortlist | awk '{printf "%s, %s\n", $2, $1}'

Resources