I am trying to add multiple users from a CSV in Linux using CentOS [duplicate] - linux

I have
while read $field1 $field2 $field3 $field4
do
$trimmed=$field2 | sed 's/ *$//g'
echo "$trimmed","$field3" >> new.csv
done < "$FEEDS"/"$DLFILE"
Now the problem is with read I can't make it split fields csv style, can I? See the input csv format below.
I need to get columns 3 and 4 out, stripping the padding from col 2, and I don't need the quotes.
Csv format with col numbers:
12 24(")25(,)26(")/27(Field2values) 42(")/43(,)/44(Field3 decimal values)
"Field1_constant_value","Field2values ",Field3,Field4
Field1 is constant and irrelevant. Data is quoted, goes from 2-23 inside the quotes.
Field2 fixed with from cols 27-41 inside quotes, with the data at the left and padded by spaces on the right.
Field3 is a decimal number with 1,2, or 3 digits before the decimal and 2 after, no padding. Starts at col 74.
Field4 is a date and I don't much care about it right now.

Yes, you can use read; all you've got to do is reset the environment variable IFS -- Internal Field Separator --, so that it won't split lines by its current value (default to whitespace), but by your own delimiter.
Considering an input file "a.csv", with the given contents:
1,2,3,4
2,3,4,5
6,3,2,1
You can do this:
IFS=','
while read f1 f2 f3 f4; do
echo "fields[$f1 $f2 $f3 $f4]"
done < a.csv
And the output is:
fields[1 2 3 4]
fields[2 3 4 5]
fields[6 3 2 1]

A couple of good starting points for you are here: http://backreference.org/2010/04/17/csv-parsing-with-awk/

Related

Bash script: filter columns based on a character

My text file should be of two columns separated by a tab-space (represented by \t) as shown below. However, there are a few corrupted values where column 1 has two values separated by a space (represented by \s).
A\t1
B\t2
C\sx\t3
D\t4
E\sy\t5
My objective is to create a table as follows:
A\t1
B\t2
C\t3
D\t4
E\t5
i.e. discard the 2nd value that is present after the space in column 1 for eg. in C\sx\t3 I can discard the x that is present after space and store the columns as C\t3.
I have tried a couple of things but with no luck.
I tried to cut the cols based on \t into independent columns and then cut the first column based on \s and join them again. However, it did not work.
Here is the snippet:
col1=(cut -d$'\t' -f1 $file | cut -d' ' -f1)
col2=(cut -d$'\t' -f1 $file)
myArr=()
for((idx=0;idx<${#col1[#]};idx++));do
echo "#{col1[$idx]} #{col2[$idx]}"
# I will append to myArr here
done
The output is appending the list of col2 to the col1 as A B C D E 1 2 3 4 5. And on top of this, my file is very huge i.e. 5,300,000 rows so I would like to avoid looping over all the records and appending them one by one.
Any advice is very much appreciated.
Thank you. :)
And another sed solution:
Search and replace any literal space followed by any number of non-TAB-characters with nothing.
sed -E 's/ [^\t]+//' file
A 1
B 2
C 3
D 4
E 5
If there could be more than one actual space in there just make it 's/ +[^\t]+//' ...
Assuming that when you say a space you mean a blank character then using any awk:
awk 'BEGIN{FS=OFS="\t"} {sub(/ .*/,"",$1)} 1' file
Solution using Perl regular expressions (for me they are easier than seds, and more portable as there are few versions of sed)
$ cat ls
A 1
B 2
C x 3
D 4
E y 5
$ cat ls |perl -pe 's/^(\S+).*\t(\S+)/$1 $2/g'
A 1
B 2
C 3
D 4
E 5
This code gets all non-empty characters from the front and all non-empty characters from after \t
Try
sed $'s/^\\([^ \t]*\\) [^\t]*/\\1/' file
The ANSI-C Quoting ($'...') feature of Bash is used to make tab characters visible as \t.
take advantage of FS and OFS and let them do all the hard work for you
{m,g}awk NF=NF FS='[ \t].*[ \t]' OFS='\t'
A 1
B 2
C 3
D 4
E 5
if there's a chance of leading edge or trailing edge spaces and tabs, then perhaps
mawk 'NF=gsub("^[ \t]+|[ \t]+$",_)^_+!_' OFS='\t' RS='[\r]?\n'

Find if the first 10 digits of two columns on csv file are matched in bash

I have a file which contains two columns (names.csv), values are separated by comma
,
a123456789-anything,a123456789-anything
b123456789-anything,b123456789-anything
c123456789-anything,c123456789-anything
d123456789-anything,d123456789-anything
e123456789-anything,e123456789-anything
e123456777-anything,e123456999-anything
These columns have values with 10 digits, which are unique identifiers, and some extra junk in the values (-anything).
I want to see if the columns have the prefix matched!
To verify the values on first and second column I use:
cat /home/names.csv | parallel --colsep ',' echo column 1 = {1} column 2 = {2}
Which print the values. Because the values are HEX digits, it is cumbersome to verify one by one by only reading. Is there any way to see if the 10 digits of each column pair are exact matches? They might contain special characters!
Expected output (example, but anything that says the columns are matched or not can work):
Matches (including first line):
,
a123456789-anything,a123456789-anything
b123456789-anything,b123456789-anything
c123456789-anything,c123456789-anything
d123456789-anything,d123456789-anything
e123456789-anything,e123456789-anything
Non-matches
e123456777-anything,e123456999-anything
Here's one way using awk. It prints every line where the first 10 characters of the first two fields match.
% cat /tmp/names.csv
,
a123456789-anything,a123456789-anything
b123456789-anything,b123456789-anything
c123456789-anything,c123456789-anything
d123456789-anything,d123456789-anything
e123456789-anything,e123456789-anything
e123456777-anything,e123456999-anything
% awk -F, 'substr($1,1,10)==substr($2,1,10)' /tmp/names.csv
,
a123456789-anything,a123456789-anything
b123456789-anything,b123456789-anything
c123456789-anything,c123456789-anything
d123456789-anything,d123456789-anything
e123456789-anything,e123456789-anything

rank grep result by entries' timestamp

I would like to rank log entries by the timestamp of each entry.
let's say my grep result is like this, with each entry having different number of fields and time on different number of columns:
a, 3, time:123
b, time:124, 4
c, time:122, 5
how should I pipe the result such that it looks like this?
c, time:122, 5
a, 3, time:123
b, time:124, 4
Would you try the following:
while IFS= read -r line; do
[[ $line =~ time:([0-9]+) ]] && printf "%s\t%s\n" "${BASH_REMATCH[1]}" "$line"
done < file | sort -n | cut -f 2-
It first extracts the time after the time: substring.
Then it prepends the time before the line using a tab as a delimiter.
It numerically sorts the lines.
Finally it cuts off the 1st field.
A general solution is:
for each line:
detect log format
extract timestamp column based on detected format
convert timestamp into sortable-form
print sortable-form + column delimiter + original line
pipe output of previous stage into something that sorts on the new first column
pipe output of previous stage into something that strips off the new first column

Reading multiline standard input in J

Now I use this code to read data from standard input:
print =: 1!:2&2
read =: 1!:1[3
in =. (read-.LR)-.CR
But it returns just a sequence of numbers, e.g. input:
2
3
4
5
Output:
2345
Number of numbers is unknown, but each is in the separate line
When reading with (1!:1) you read a stream of characters. You have to manipulate the stream to get your desired input.
For example. If you want to enter a list of line separated integers, you would read the list, then split it by LF, remove LF and then convert to integer. You can achieve the first two steps using cut (;._2) and the conversion using do (".):
in =: ".;._2 (1!:1) 3
If you want to enter a list of space separated integers, you would just use do, the splitting would be implied by the spaces:
in =: ". LF -.~ (1!:1) 3
trailing LF (if present) has to be removed before applying ". because do can't convert special characters.

Mapping lines to columns in *nix

I have a text file that was created when someone pasted from Excel into a text-only email message. There were originally five columns.
Column header 1
Column header 2
...
Column header 5
Row 1, column 1
Row 1, column 2
etc
Some of the data is single-word, some has spaces. What's the best way to get this data into column-formatted text with unix utils?
Edit: I'm looking for the following output:
Column header 1 Column header 2 ... Column header 5
Row 1 column 1 Row 1 column 2 ...
...
I was able to achieve this output by manually converting the data to CSV in vim by adding a comma to the end of each line, then manually joining each set of 5 lines with J. Then I ran the csv through column -ts, to get the desired output. But there's got to be a better way next time this comes up.
Perhaps a perl-one-liner ain't "the best" way, but it should work:
perl -ne 'BEGIN{$fields_per_line=5; $field_seperator="\t"; \
$line_break="\n"} \
chomp; \
print $_, \
$. % $fields_per_row ? $field_seperator : $line_break; \
END{print $line_break}' INFILE > OUTFILE.CSV
Just substitute the "5", "\t" (tabspace), "\n" (newline) as needed.
You would have to use a script that uses readline and counter. When the program reaches that line you want, use cut command and space as a dilimeter to get the word you want
counter=0
lineNumber=3
while read line
do
counter += 1
if lineNumber==counter
do
echo $line | cut -d" " -f 4
done
fi

Resources