How to remove a value between the 5th and 6th separator using SED? - linux

Remove value between 5 and 6 separator:
000000000000;00000000000000;2;NONE;true;526;246;101;100;2;1;;;;;;8;101/100.0000.99.99;526/125.000.122.000
We need to get:
000000000000;00000000000000;2;NONE;true;;246;101;100;2;1;;;;;;8;101/100.0000.99.99;526/125.000.122.000

Using awk you can do this:
s='000000000000;00000000000000;2;NONE;true;526;246;101;100;2;1;;;;;;8;101/100.0000.99.99;526/125.000.122.000'
awk 'BEGIN{FS=OFS=";"} {$6=""} 1' <<< "$s"
000000000000;00000000000000;2;NONE;true;;246;101;100;2;1;;;;;;8;101/100.0000.99.99;526/125.000.122.000
FS=OFS=";" sets input and output field separators as ;
$6="" makes 6th field empty
1 prints the whole record

Let's define your string as s:
$ s='000000000000;00000000000000;2;NONE;true;526;246;101;100;2;1;;;;;;8;101/100.0000.99.99;526/125.000.122.000'
To remove the sixth field:
$ echo "$s" | sed -E 's/(([^;]*;){5})[^;]*/\1/'
000000000000;00000000000000;2;NONE;true;;246;101;100;2;1;;;;;;8;101/100.0000.99.99;526/125.000.122.000
How it works
We use a single sed substitution command:
s/(([^;]*;){5})[^;]*/\1/
Here, (([^;]*;){5}) matches the first five fields and saves them in group 1.
[^;]* matches the field that follows. In other words, it matches the sixth field.
The replacement text is just \1 which means group 1 which is the first five fields. Thus, the sixth field is removed and not replaced.

Related

How to add a Header with value after a perticular column in linux

Here I want to add a column with header name Gender after column name Age with value.
cat Person.csv
First_Name|Last_Name||Age|Address
Ram|Singh|18|Punjab
Sanjeev|Kumar|32|Mumbai
I am using this:
cat Person.csv | sed '1s/$/|Gender/; 2,$s/$/|Male/'
output:
First_Name|Last_Name||Age|Address|Gender
Ram|Singh|18|Punjab|Male
Sanjeev|Kumar|32|Mumbai|Male
I want output like this:
First_Name|Last_Name|Age|Gender|Address
Ram|Singh|18|Male|Punjab
Sanjeev|Kumar|32|Male|Mumbai
I took the second pipe out (for consistency's sake) ... the sed should look like this:
$ sed -E '1s/^([^|]+\|[^|]+\|[^|]+\|)/\1Gender|/;2,$s/^([^|]+\|[^|]+\|[^|]+\|)/\1male|/' Person.csv
First_Name|Last_Name|Age|Gender|Address
Ram|Singh|18|male|Punjab
Sanjeev|Kumar|32|male|Mumbai
We match and remember the first three fields and replace them with themselves, followed by Gender and male respectively.
Using awk:
$ awk -F"|" 'BEGIN{ OFS="|"}{ last=$NF; $NF=""; print (NR==1) ? $0"Gender|"last : $0"Male|"last }' Person.csv
First_Name|Last_Name||Age|Gender|Address
Ram|Singh|18|Male|Punjab
Sanjeev|Kumar|32|Male|Mumbai
Use '|' as the input field separator and set the output field separator as '|'. Store the last column value in variable named last and then remove the last column $NF="". Then print the appropriate output based on whether is first row or succeeding rows.

How to replace some cells number of .csv file if specific lines found in Linux

Lets say I have the following file.csv file content
"US","BANANA","123","100","0.5","ok"
"US","APPLE","456","201","0.1", "no"
"US","PIE","789","109","0.8","yes"
"US","APPLE","245","201","0.4","no"
I want to search all lines that have APPLE and 201, and then replace the column 5 values to 0. So, my output would look like
"US","BANANA","123","100","0.5","ok"
"US","APPLE","456","201","0", "no"
"US","PIE","789","109","0.8","yes"
"US","APPLE","245","201","0","no"
I can do grep search
grep "APPLE" file.csv | grep 201
to find out the lines. But could not figure out how to modify column 5 values of these lines in the original file.
You can use awk for this:
awk -F, '$2=="\"APPLE\"" { for (i=1;i<=NF;i++) { if ($i=="\"201\"") { gsub($5,"\""substr($5,2,length($5)-1)*1.10"\"",$5) } } }1' file.csv
Set the field delimiter to , and then when the second field is equal to APPLE in quotes, loop through each field and check if it is equal to 201 in quotes. If it is, replace the 5th field with 0 in quotes using Awk's gsub function. Print each line, changed or otherwise with short-hand 1

How can I use grep and regular expression to display names with just 3 characters

I am new to grep and UNIX. I have a sample of data and want to display all the first names that only contain three characters e.g. Lee_example. but I having some difficulty doing that. I am currently using this code cat file.txt|grep -E "[A-Z][a-z]{2}" but it is displaying all the names that contain at least 3 characters and not only 3 characters
Sample data
name
number
Lee_example
1
Hector_exaple
2
You need to match the _ after the first name.
grep -E "[A-Z][a-z]{2}_"
With awk:
awk -F_ 'length($1)==3{print $1}'
-F_ tells awk to split the input lines by _. length($1) == 3 checks whether the first fields (the name) is 3 characters long and {print $1} prints the name in that case.

Is there a way to remove only the followed duplicates?

I have a CSV input with these columns:
1,zzzz,xxxx,
1,xxxx,xyxy,
2,xxxx,xxxx,
3,yyyy,xxxx,
3,xxxx,yyyy,
3,xxxx,zzzz,
1,ffff,xxxx,
1,aaaa,xxxx,
And I need to discard lines where the first field matches that of the preceding line:
1,zzzz,xxxx,
2,xxxx,xxxx,
3,yyyy,xxxx,
1,ffff,xxxx,
I tried sort | uniq alone but didn't work because all lines are different with exception of first field (number).
Use awk instead of uniq:
awk -F, '$1 != last { last=$1; print }'
-F, sets the field separator to comma. $1 is the contents of the first field, so this prints the line whenever the first field changes.
Got the wanted output with uniq --check-chars=N; the uniq will check only a specified number of characters in the lines, and since the input isn't sorted this will allow the characters to appear later on the list.

Replace every n'th occurrence in huge line in a loop

I have this line for example:
1,2,3,4,5,6,7,8,9,10
I want to insert a newline (\n) every 2nd occurrence of "," (replace the 2nd, with newline) .
This might work for you (GNU sed):
sed 's/,/\n/2;P;D' file
If I understand what you're trying to do correctly, then
echo '1,2,3,4,5,6,7,8,9,10' | sed 's/\([^,]*,[^,]*\),/\1\n/g'
seems like the most straightforward way. \([^,]*,[^,]*\) will capture 1,2, 3,4, and so forth, and the commas between them are replaced with newlines through the usual s///g. This will print
1,2
3,4
5,6
7,8
9,10
Can't add comment to wintermutes answer but it doesn't need the first , section as it will have to have had a previous field to be comma separated.
sed 's/\(,[^,]*\),/\1\n/g'
Will work the same
Also I'll add another alternative( albeit worse and leaves a trailing newline)
echo "1,2,3,4,5,6,7,8,9,10" | xargs -d"," -n2 | tr ' ' ','
I would use awk to do this:
$ awk -F, '{ for (i=1; i<=NF; ++i) printf "%s%s", $i, (i%2?FS:RS) }' file
1,2
3,4
5,6
7,8
9,10
It loops through each field, printing each one followed by either the field separator (defined as a comma) or the record separator (a newline) depending on the value of i%2.
It's slightly longer than the sed versions presented by others, although one nice thing about it is that you can alter the number of columns per line easily by changing the 2 to whatever value you like.
To avoid a trailing comma after the last field in the case where the number of fields isn't evenly divisible, you can change the ternary to i<NF&&i%2?FS:RS.

Resources