Now I use this code to read data from standard input:
print =: 1!:2&2
read =: 1!:1[3
in =. (read-.LR)-.CR
But it returns just a sequence of numbers, e.g. input:
2
3
4
5
Output:
2345
Number of numbers is unknown, but each is in the separate line
When reading with (1!:1) you read a stream of characters. You have to manipulate the stream to get your desired input.
For example. If you want to enter a list of line separated integers, you would read the list, then split it by LF, remove LF and then convert to integer. You can achieve the first two steps using cut (;._2) and the conversion using do (".):
in =: ".;._2 (1!:1) 3
If you want to enter a list of space separated integers, you would just use do, the splitting would be implied by the spaces:
in =: ". LF -.~ (1!:1) 3
trailing LF (if present) has to be removed before applying ". because do can't convert special characters.
Related
I would like to append - at the end of each word match. But, the number of - appended should be based on the count of the match, so that the total number of characters in that line remain constant.
As shown in the example below, the total number of characters should be 6.
e.g.
ab
xyz
abcde
The above text should be replaced to:
ab----
xyz---
abcde-
You can use \= to substitute with an expression, see :h sub-replace-expression.
When the substitute string starts with \=, the remainder is interpreted as an expression.
The submatch() function can be used to obtain matched text. The whole matched text can be accessed with submatch(0). The text matched with the first pair of () with submatch(1). Likewise for further sub-matches in ().
So you can achieve it like this:
:[range]s//\=submatch(0) . repeat('-', 6-strlen(submatch(0)))/
I have a file which contains two columns (names.csv), values are separated by comma
,
a123456789-anything,a123456789-anything
b123456789-anything,b123456789-anything
c123456789-anything,c123456789-anything
d123456789-anything,d123456789-anything
e123456789-anything,e123456789-anything
e123456777-anything,e123456999-anything
These columns have values with 10 digits, which are unique identifiers, and some extra junk in the values (-anything).
I want to see if the columns have the prefix matched!
To verify the values on first and second column I use:
cat /home/names.csv | parallel --colsep ',' echo column 1 = {1} column 2 = {2}
Which print the values. Because the values are HEX digits, it is cumbersome to verify one by one by only reading. Is there any way to see if the 10 digits of each column pair are exact matches? They might contain special characters!
Expected output (example, but anything that says the columns are matched or not can work):
Matches (including first line):
,
a123456789-anything,a123456789-anything
b123456789-anything,b123456789-anything
c123456789-anything,c123456789-anything
d123456789-anything,d123456789-anything
e123456789-anything,e123456789-anything
Non-matches
e123456777-anything,e123456999-anything
Here's one way using awk. It prints every line where the first 10 characters of the first two fields match.
% cat /tmp/names.csv
,
a123456789-anything,a123456789-anything
b123456789-anything,b123456789-anything
c123456789-anything,c123456789-anything
d123456789-anything,d123456789-anything
e123456789-anything,e123456789-anything
e123456777-anything,e123456999-anything
% awk -F, 'substr($1,1,10)==substr($2,1,10)' /tmp/names.csv
,
a123456789-anything,a123456789-anything
b123456789-anything,b123456789-anything
c123456789-anything,c123456789-anything
d123456789-anything,d123456789-anything
e123456789-anything,e123456789-anything
I have
while read $field1 $field2 $field3 $field4
do
$trimmed=$field2 | sed 's/ *$//g'
echo "$trimmed","$field3" >> new.csv
done < "$FEEDS"/"$DLFILE"
Now the problem is with read I can't make it split fields csv style, can I? See the input csv format below.
I need to get columns 3 and 4 out, stripping the padding from col 2, and I don't need the quotes.
Csv format with col numbers:
12 24(")25(,)26(")/27(Field2values) 42(")/43(,)/44(Field3 decimal values)
"Field1_constant_value","Field2values ",Field3,Field4
Field1 is constant and irrelevant. Data is quoted, goes from 2-23 inside the quotes.
Field2 fixed with from cols 27-41 inside quotes, with the data at the left and padded by spaces on the right.
Field3 is a decimal number with 1,2, or 3 digits before the decimal and 2 after, no padding. Starts at col 74.
Field4 is a date and I don't much care about it right now.
Yes, you can use read; all you've got to do is reset the environment variable IFS -- Internal Field Separator --, so that it won't split lines by its current value (default to whitespace), but by your own delimiter.
Considering an input file "a.csv", with the given contents:
1,2,3,4
2,3,4,5
6,3,2,1
You can do this:
IFS=','
while read f1 f2 f3 f4; do
echo "fields[$f1 $f2 $f3 $f4]"
done < a.csv
And the output is:
fields[1 2 3 4]
fields[2 3 4 5]
fields[6 3 2 1]
A couple of good starting points for you are here: http://backreference.org/2010/04/17/csv-parsing-with-awk/
If I have a string "213_str_12". I want to find digits only within first 5 characters so I would get only 212. Is that possible in bash?
Using only bash you can do
s='213_str_12'
f5="${s:0:5}"
echo "${f5//[^0-9]/}"
213
f5 contains the first 5 chars and then replaces all the non-digits to an empty string.
I have a huge data set of entries like these:
(21, 2, '23.5R25 ETADT', 'description, with a comma'),
(22, 1, '26.5R25 ETADT', 'Description without a comma'),
(23, 5, '20.5R20.5', 'Another description with ; semicolumn'),
I'm trying to replace every comma in the list with a tab. Excluding the commas within the single quotes. Also excluding the ending commas.
So the examples entries should become:
(21[TAB]2[TAB]'23.5R25 ETADT'[TAB]'description, with a comma'),
(22[TAB]1[TAB]'26.5R25 ETADT'[TAB]'Description without a comma'),
(23[TAB]5[TAB]'20.5R20.5'[TAB]'Another description with ; semicolumn'),
I've got like 6000 rows of data like this. The tabs allow me to tell Excel to import the elements of these entries into different columns.
The Regex I've tried was this: [ ]*,[ ]*
But this Regex selects all the commas, even the ones within the single quotes.
It looks as though each of your lines has 4 elements within parenthesis. And it looks like only the last 2 elements use single quotes. If those assumptions can be made, I've tested the following in Notepad++:
"Find what :" ^\(([^,]*),\s*([^,]*),\s*'([^']*)'\s*,\s*
"Replace with :" \(\1\t\2\t'\3'\t
EDIT:
The search regex is dependent upon the 4 column model with only the last two elements having single quotes. Visually this is how it works:
^\(: Finds an opening parenthesis
([^,]*): Captures non-comma characters which will be all of element 1
,\s*: Matches a comma and any trailing spaces
([^,]*): Captures non-comma characters which will be all of element 2
,\s*: Matches a comma and any trailing spaces
'([^']*)': Captures the string in single quotes which will be all of element 3
\s*,\s*: Matches a comma and all surrounding spaces
Ignore the rest of the string, there are no more commas to be replaced we just want to replace parts of the line we just read in