calculate sum of a column excluding comma in between a number in bash scripting - linux

This is the file(calculate.csv) which I have.
column1,column2,column3
10,'rohit', 123
20,'warner',-23
30,'anna',234
40,'shreya',19
50,'shravs',89
60,'vasu',12
100,'ajay',87
"1,000",'sumanth',-8
"2,000",'arjun',"1,228"
I need a command to calculate the sum of the column1
But it won't work for "1,000" and "2,000".
Is there any other way to ignore comma between quotes ""?
I need a sed command to ignore comma when it is in between "".
Ex:- "1,000",'sumanth' it should become 1000,'sumanth'.
The output should be 3310 for sum of column1.
The output should be 1761 for sum of column3.

Related

Find if the first 10 digits of two columns on csv file are matched in bash

I have a file which contains two columns (names.csv), values are separated by comma
,
a123456789-anything,a123456789-anything
b123456789-anything,b123456789-anything
c123456789-anything,c123456789-anything
d123456789-anything,d123456789-anything
e123456789-anything,e123456789-anything
e123456777-anything,e123456999-anything
These columns have values with 10 digits, which are unique identifiers, and some extra junk in the values (-anything).
I want to see if the columns have the prefix matched!
To verify the values on first and second column I use:
cat /home/names.csv | parallel --colsep ',' echo column 1 = {1} column 2 = {2}
Which print the values. Because the values are HEX digits, it is cumbersome to verify one by one by only reading. Is there any way to see if the 10 digits of each column pair are exact matches? They might contain special characters!
Expected output (example, but anything that says the columns are matched or not can work):
Matches (including first line):
,
a123456789-anything,a123456789-anything
b123456789-anything,b123456789-anything
c123456789-anything,c123456789-anything
d123456789-anything,d123456789-anything
e123456789-anything,e123456789-anything
Non-matches
e123456777-anything,e123456999-anything
Here's one way using awk. It prints every line where the first 10 characters of the first two fields match.
% cat /tmp/names.csv
,
a123456789-anything,a123456789-anything
b123456789-anything,b123456789-anything
c123456789-anything,c123456789-anything
d123456789-anything,d123456789-anything
e123456789-anything,e123456789-anything
e123456777-anything,e123456999-anything
% awk -F, 'substr($1,1,10)==substr($2,1,10)' /tmp/names.csv
,
a123456789-anything,a123456789-anything
b123456789-anything,b123456789-anything
c123456789-anything,c123456789-anything
d123456789-anything,d123456789-anything
e123456789-anything,e123456789-anything

I am trying to add multiple users from a CSV in Linux using CentOS [duplicate]

I have
while read $field1 $field2 $field3 $field4
do
$trimmed=$field2 | sed 's/ *$//g'
echo "$trimmed","$field3" >> new.csv
done < "$FEEDS"/"$DLFILE"
Now the problem is with read I can't make it split fields csv style, can I? See the input csv format below.
I need to get columns 3 and 4 out, stripping the padding from col 2, and I don't need the quotes.
Csv format with col numbers:
12 24(")25(,)26(")/27(Field2values) 42(")/43(,)/44(Field3 decimal values)
"Field1_constant_value","Field2values ",Field3,Field4
Field1 is constant and irrelevant. Data is quoted, goes from 2-23 inside the quotes.
Field2 fixed with from cols 27-41 inside quotes, with the data at the left and padded by spaces on the right.
Field3 is a decimal number with 1,2, or 3 digits before the decimal and 2 after, no padding. Starts at col 74.
Field4 is a date and I don't much care about it right now.
Yes, you can use read; all you've got to do is reset the environment variable IFS -- Internal Field Separator --, so that it won't split lines by its current value (default to whitespace), but by your own delimiter.
Considering an input file "a.csv", with the given contents:
1,2,3,4
2,3,4,5
6,3,2,1
You can do this:
IFS=','
while read f1 f2 f3 f4; do
echo "fields[$f1 $f2 $f3 $f4]"
done < a.csv
And the output is:
fields[1 2 3 4]
fields[2 3 4 5]
fields[6 3 2 1]
A couple of good starting points for you are here: http://backreference.org/2010/04/17/csv-parsing-with-awk/

How to edit the left hand column and replace with values in Linux?

I have the following text file:
.txt file
In the left hand column all the values are '0' is there a way to change only the left hand column to replace all the zeros with the value 15. I cant find all and replace as other columns contain '0' which cant be altered, this also cant be done manually as the file contains 10,000 lines. I'm wondering if this is possible from the command line or with a script.
Thanks
Using awk:
awk '$1 == 0 { $1 = 15 } 1' file.txt
Replaces the first column with 15 on each line only if the original value is 0.

Align text after n-th column in vim removing unnecessary blanks

In vim, in a Windows machine (with no access to "unix"-like commands such command column) I want to reformat this code to make it more readable:
COLUMN KEY_ID FORMAT 9999999999
COLUMN VALUE_1 FORMAT 99
COLUMN VALUE_2 FORMAT 99
COLUMN VALUE_3 FORMAT 999
COLUMN VALUE_4 FORMAT 999
And I want to have this using as less commands as possible:
COLUMN KEY_ID FORMAT 9999999999
COLUMN VALUE_1 FORMAT 99
COLUMN VALUE_2 FORMAT 99
COLUMN VALUE_3 FORMAT 999
COLUMN VALUE_4 FORMAT 999
Note this is just an excerpt, as there many more lines in which I must do the same.
You could use the following command:
:%s/\w\zs\s*\zeFORMAT/^I
The pattern will match the whitespaces between FORMAT and the end of the previous word and replace it by a tab:
\w Any 'word' character
\zs Start the matching
\s* Any number of whitespace
\ze End the matching
FORMAT The actual word format
\zs and \ze allow to apply the substitution only on the whitespaces see: :h /\zs and :h /\ze
Note that ^I should be inserted with ctrl+vtab
The tabular plugin recommended by #SatoKatsura would be a good way to do it too.
You can also generalize that. Let's say you have the following file:
COLUMN KEY_ID FORMAT 9999999999
COLUMN VALUE_1 FOO 99
COLUMN VALUE_2 BAR 99
You could use this command:
:%s/^\(\w*\s\)\{1}\w*\zs\s*\ze/
Were the pattern can be detailed like that:
^ Match the beginning of the line
\(\w*\s\)\{1} One occurrence of the pattern \w*\s i.e. one column
\w* Another column
\zs\s*\ze The whitespaces after the previous column
You could change the value of \{1} to apply the command on the next columns.
EDIT to answer #aturegano comment, here is a way to align the column to another one:
%s/^\(\w*\s\)\{1}\w*\zs\s*\ze/\=repeat(' ', 30-matchstrpos(getline('.'), submatch(0))[1])
The idea is still to match the whitespaces which must be aligned, on the second part of the substitution command we use a sub-replace-expression (See :h sub-replace-expression).
This allows us to use a command from the substitution part, which can be explained like this:
\= Interpret the next characters as a command
repeat(' ', XX) Replace the match with XX whitespaces
XX is decomposed like this:
30- 30 less the next expression
matchstrpos()[1] Returns the columns where the second argument appears in the first one
getline('.') The current line (i.e. the one containing the match
submatch(0) The matched string
[1] Necessary since matchstrpos() returns a list:
[matchedString, StartPosition, EndPosition]
and we are looking for the second value.
You then simply have to replace 30 by the column where you want to move your next column.
See :h matchstrpos(), :h getline() and :h submatch()
For alignment, there are three well-known plugins:
the venerable Align - Help folks to align text, eqns, declarations, tables, etc
the modern tabular
the contender vim-easy-align
Posting an answer as requested:
:g/^COLUMN / s/.*/\=call('printf', ['%s %-30s %s %s'] + split(submatch(0)))/
Explanation:
g/^COLUMN / - apply the following command to lines matching /^COLUMN / (cf. :h :global)
\= - replace with the result of evaluating an expression, rather than with a fixed string (cf. :h s/\=)
submatch(0) - the line being matched
split(...) - split line into words
printf(...) - format the line
call(...) - we'd like to have printf('%s %-30s %s %s', list), but printf() doesn't take "real" lists as arguments, so we have to unfold the list with a call(...) (cf. :h call()).
Yet another solution:
:%s/ \{2,}/ /g
This solution is not perfect because the result will have an extra single space on the first line. To fix this problem:
:%s/\%>15c \{2,}/ /g
Explanation of pattern:
%>15c\s\{2,}
%>15c Matches only after column 15
\s\{2,} Matches two or more white spaces

Remove extra commas from only 2nd and 3rd row of CSV file

I have a comma delimited file (CSV file) test.csv as shown below.
FHEAD,1,2,3,,,,,,
FDEP,2,3,,,,,,,,
FCLS,3,,,4-5,,,,,,,
FDETL,4,5,6,7,8,
FTAIL,5,67,,,,,,
I wanted to remove the empty columns only from 2nd and 3rd row of the file i.e. were ever the records starts with FDEP and FCLS only in those rows I wanted to remove the empty columns (,,).
after removing the empty columns the same file test.csv should look like
FHEAD,1,2,3,,,,,,
FDEP,2,3
FCLS,3,4-5
FDETL,4,5,6,7,8,
FTAIL,5,67,,,,,,
How can I do this in Unix???
Here's one way to do it, using sed:
sed '/^F\(DEP\|CLS\),/ { s/,\{2,\}/,/g; s/,$// }'
We use a range of /^F\(DEP\|CLS\),/, i.e. the following command will only process lines matching ^F\(DEP\|CLS\),. This regex matches beginning-of-string, followed by F, followed by either DEP or CLS, followed by ,. In other words, we look for lines starting with FDEP, or FCLS,.
Having found such a line, we first substitute (s command) all runs (g flag, match as many times as possible) of 2 or more (\{2,\}) commas (,) in a row by a single ,. This squeezes ,,, down to a single ,.
Second, we substitute , at end-of-string by nothing. This gets rid of any trailing comma.

Resources