I have 2 variables, NUMS and TITLES.
NUMS contains the string
1
2
3
TITLES contains the string
A
B
C
How do I get output that looks like:
1 A
2 B
3 C
paste -d' ' <(echo "$NUMS") <(echo "$TITLES")
Having multi-line strings in variables suggests that you are probably doing something wrong. But you can try
paste -d ' ' <(echo "$nums") - <<<"$titles"
The basic syntax of paste is to read two or more file names; you can use a command substitution to replace a file anywhere, and you can use a here string or other redirection to receive one of the "files" on standard input (where the file name is then conventionally replaced with the pseudo-file -).
The default column separator from paste is a tab; you can replace it with a space or some other character with the -d option.
You should avoid upper case for your private variables; see also Correct Bash and shell script variable capitalization
Bash variables can contain even very long strings, but this is often clumsy and inefficient compared to reading straight from a file or pipeline.
Convert them to arrays, like this:
NUMS=($NUMS)
TITLES=($TITLES)
Then loop over indexes of whatever array, lets say NUMS like this:
for i in ${!NUMS[*]}; {
# and echo desired output
echo "${NUMS[$i]} ${TITLES[$i]}"
}
Awk alternative:
awk 'FNR==NR { map[FNR]=$0;next } { print map[FNR]" "$0} ' <(echo "$NUMS") <(echo "$TITLE")
For the first file/variable (NR==FNR), set up an array called map with the file number record as the index and the line as the value. Then for the second file, print the entry in the array as well as the line separated by a space.
Related
I have a directory in that file names are like
Abc_def_ijk.txt-1
Abc_def_ijk.txt-2
Abc_def_ijk.txt-3
Abc_def_ijk.txt-4
Abc_def_ijk.txt-5
Abc_def_ijk.txt-6
Abc_def_ijk.txt-7
Abc_def_ijk.txt-8
Abc_def_ijk.txt-9
I like to divide into 4 variables as below
v1=Abc_def_ijk.txt-1,Abc_def_ijk.txt-5,Abc_def_ijk.txt-9
V2=Abc_def_ijk.txt-2,Abc_def_ijk.txt-6
V3=Abc_def_ijk.txt-3,Abc_def_ijk.txt-7
V4=Abc_def_ijk.txt-4,Abc_def_ijk.txt-8
If no of files increase it will goto any of above variables. I'm looking for awk one liners to achieve above.
I would do it using GNU AWK following way, let file.txt content be
Abc_def_ijk.txt-1
Abc_def_ijk.txt-2
Abc_def_ijk.txt-3
Abc_def_ijk.txt-4
Abc_def_ijk.txt-5
Abc_def_ijk.txt-6
Abc_def_ijk.txt-7
Abc_def_ijk.txt-8
Abc_def_ijk.txt-9
then
awk '{arr[NR%4]=arr[NR%4] "," $0}END{print substr(arr[1],2);print substr(arr[2],2);print substr(arr[3],2);print substr(arr[0],2)}' file.txt
output
Abc_def_ijk.txt-1,Abc_def_ijk.txt-5,Abc_def_ijk.txt-9
Abc_def_ijk.txt-2,Abc_def_ijk.txt-6
Abc_def_ijk.txt-3,Abc_def_ijk.txt-7
Abc_def_ijk.txt-4,Abc_def_ijk.txt-8
Explanation: I store lines in array arr and decide where to put given line based on numer of line (NR) modulo (%) four (4). I do concatenate to what is currently stored (empty string if nothing so far) with , and content of current line ($0), this result in leading , which I remove using substr function, i.e. starting at 2nd character.
(tested in GNU Awk 5.0.1)
I have a very large csv file that is too big to open in excel for this operation.
I need to replace a specific string for approx 6000 records out of the 1.5mil in the csv, the string itself is in the comma separated format like so:
ABC,FOO.BAR,123456
With other columns on either side that are of no concern. I only need enough to get enough data to make sure the final data string (the numbers) are unique.
I have another file with the string to replace and the replacement string like (for the above):
"ABC,FOO.BAR,123456","ABC,FOO.BAR,654321"
So in the case above 123456 is being replaced by 654321. A simple (yet maddeningly slow) way to do this is open both docs in notepad++ and find the first string then replace with the second string, but with over 6000 records this isnt great.
I was hoping someone could give advice on a scripting solution? e.g.:
$file1 = base.csv
$file2 = replace.csv
For each row in $file2 {
awk '{sub(/$file2($firstcolumn)/,$file2($Secondcolumn)' $file1
}
Though Im not entirely sure how to adapt awk to do an operation like this..
EDIT: Sorry I should have been more specific, the data in my replacement csv is only in two columns; two raw strings!
it would be easier of course if your delimiter is not used within the fields...
you can do in two steps, create a sed script from the lookup file and use it for the main data file for replacements
for example,
(assumes there is no escaped quotes in the fields, may not hold)
$ awk -F'","' '{print "s/" $1 "\"/\"" $2 "/"}' lookup_file > replace.sed
$ sed -f replace.sed data_file
awk -F\" '
NR==FNR { subst[$2]=$4; next }
{
for (s in subst) {
pos = index($0, s)
if (pos) {
$0 = substr($0, 1, pos-1) subst[s] substr($0, pos + length(s))
break
}
}
print
}
' "$file2" "$file1" # > "$file1.$$.tmp" && mv "$file1.$$.tmp" "$file1"
The part after the # shows how you could replace the input data file with the output.
The block associated with NR==FNR is only executed for the first input file, the one with the search and replacement strings.
subst[$2]=$4 builds an associative array (dictionary): the key is the search string, the value the replacement string.
Fields $2 and $4 are the search string and the replacement string, respectively, because Awk was instructed to break in the input into fields by " (-F\"); note that this assumes that your strings do not contain escaped embedded " chars.
The remaining block then processes the data file:
For each input line, it loops over the search strings and looks for a match on the current line:
Once a match is found, the replacement string is substituted for the search string, and matching stops.
print simply prints the (possibly modified) line.
Note that since you want literal string replacements, regex-based functions such as sub() are explicitly avoided in favor of literal string-processing functions index() and substr().
As an aside: since you say there are columns on either side in the data file, consider making the search/replacement strings more robust by placing , on either side of them (this could be done inside the awk script).
I would recommend using a language with a CSV parsing library rather than trying to do this with shell tools. For example, Ruby:
require 'csv'
replacements = CSV.open('replace.csv','r').to_h
File.open('base.csv', 'r').each_line do |line|
replacements.each do |old, new|
line.gsub!(old) { new }
end
puts line
end
Note that Enumerable#to_h requires Ruby v2.1+; replace with this for older Rubys:
replacements = Hash[*CSV.open('replace.csv','r').to_a.flatten]
You only really need CSV for the replacements file; this assumes you can apply the substitutions to the other file as plain text, which speeds things up a bit and avoids having to parse the old/new strings out into fields themselves.
I have a file containing a lot of string words, severed by pipes. I would like to have a script (written in bash or in any other programming language) that is able to replace every word with an incremental unique integer (something like an ID).
From an input like this:
aaa|ccccc|ffffff|iii|j
aaa|ddd|ffffff|iii|j
bb|eeee|hhhhhh|iii|k
I'd like to have something like this
1|3|6|8|9
1|4|6|8|9
2|5|7|8|10
That is: aaa has been replaced by 1, bb has been replaced by 2, and so on.
How to do this? Thanks!
awk to the rescue...
this will do the numbering row-wise, I'm not sure it's important enough to make it columnar.
awk -F "|" -vOFS="|" '{
line=sep="";
for(i=1;i<=NF;i++) {
if(!a[$i])a[$i]=++c;
line=line sep a[$i];
sep=OFS
}
print line
}' words
1|2|3|4|5
1|6|3|4|5
7|8|9|4|10
to get the word associations into another file, you can replace
if(!a[$i])a[$i]=++c;
with
if(!a[$i]){
a[$i]=++c;
print $i"="a[$i] > "assoc"
}
You can define an associative array
declare -A array
use the word as keys and an incremental number as value
array[aaa]=$n
then replace the original words by the values
I want to search the strings starting with "double" in a text file and pass the line numbers to two variable (Suppose I know there must be two lines have "double"). Next, I want to get the numbers in those strings and pass them to other two variables. After that, I want to delete those lines in the text. Could you tell me how to do it?
In order to store the line numbers in 2 variables, var1 and var2 try this:
read var1 var2 <<< $(grep -Fnm 2 double file | cut -d: -f1)
Now var1 and var2 contain the line numbers of the lines containing the word double.
To "pass them" to two other variables:
foo="$var1"
bar="$var2"
To delete the lines, use sed as shown below:
sed "${var1}d;${var2}d;" file
noob here, sorry if a repost. I am extracting a string from a file, and end up with a line, something like:
abcdefg:12345:67890:abcde:12345:abcde
Let's say it's in a variable named testString
the length of the values between the colons is not constant, but I want to save the number, as a string is fine, to a variable, between the 2nd and 3rd colons. so in this case I'd end up with my new variable, let's call it extractedNum, being 67890 . I assume I have to use sed but have never used it and trying to get my head around it...
Can anyone help? Cheers
On a side-note, I am using find to extract the entire line from a string, by searching for the 1st string of characters, in this case the abcdefg part.
Pure Bash using an array:
testString="abcdefg:12345:67890:abcde:12345:abcde"
IFS=':'
array=( $testString )
echo "value = ${array[2]}"
The output:
value = 67890
Here's another pure bash way. Works fine when your input is reasonably consistent and you don't need much flexibility in which section you pick out.
extractedNum="${testString#*:}" # Remove through first :
extractedNum="${extractedNum#*:}" # Remove through second :
extractedNum="${extractedNum%%:*}" # Remove from next : to end of string
You could also filter the file while reading it, in a while loop for example:
while IFS=' ' read -r col line ; do
# col has the column you wanted, line has the whole line
# # #
done < <(sed -e 's/\([^:]*:\)\{2\}\([^:]*\).*/\2 &/' "yourfile")
The sed command is picking out the 2nd column and delimiting that value from the entire line with a space. If you don't need the entire line, just remove the space+& from the replacement and drop the line variable from the read. You can pick any column by changing the number in the \{2\} bit. (Put the command in double quotes if you want to use a variable there.)
You can use cut for this kind of stuff. Here you go:
VAR=$(echo abcdefg:12345:67890:abcde:12345:abcde |cut -d":" -f3); echo $VAR
For the fun of it, this is how I would (not) do this with sed, but I'm sure there's easier ways. I guess that'd be a question of my own to future readers ;)
echo abcdefg:12345:67890:abcde:12345:abcde |sed -e "s/[^:]*:[^:]*:\([^:]*\):.*/\1/"
this should work for you: the key part is awk -F: '$0=$3'
NewVar=$(getTheLineSomehow...|awk -F: '$0=$3')
example:
kent$ newVar=$(echo "abcdefg:12345:67890:abcde:12345:abcde"|awk -F: '$0=$3')
kent$ echo $newVar
67890
if your text was stored in var testString, you could:
kent$ echo $testString
abcdefg:12345:67890:abcde:12345:abcde
kent$ newVar=$(awk -F: '$0=$3' <<<"$testString")
kent$ echo $newVar
67890