Sample 1 with option t:
f(){
mapfile -t aaa < /dev/stdin
echo -e ${aaa[#]}
}
echo -e "a\nb\nc" | f
Sample 2 without option t:
f(){
mapfile aaa < /dev/stdin
echo -e ${aaa[#]}
}
echo -e "a\nb\nc" | f
The outputs are same:
a b c
EDIT:
Why are there two whitespaces in #cdarke's and #heemayl's answer?
echo -e "a\nb\nc" | f2 | od -b
0000000 141 012 040 142 012 040 143 012 012
0000011
Then I tried this: echo -e "a\tb\tc" | f2 | od -b
0000000 141 011 142 011 143 012 012
0000007
And there are no whitespaces.
Thx in advance!
To see the difference quote ${aaa[#]}:
echo -e "${aaa[#]}"
Without quoting the expansion is being subjected to word splitting according to IFS (and pathname expansion too). As the elements (strings) contain newlines, these will go through word splitting.
Example:
$ f() { mapfile -t aaa < /dev/stdin; echo -e "${aaa[#]}" ;}
$ echo -e "a\nb\nc" | f
a b c
$ g() { mapfile aaa < /dev/stdin; echo -e "${aaa[#]}" ;}
$ echo -e "a\nb\nc" | g
a
b
c
The problem is with the echo:
f1(){
mapfile -t aaa < /dev/stdin
echo -e "${aaa[#]}" # note the quotes
}
echo -e "a\nb\nc" | f1
f2(){
mapfile aaa < /dev/stdin
echo -e "${aaa[#]}" # note the quotes
}
echo -e "a\nb\nc" | f2
Gives:
a b c
a
b
c
In the second case, without the quotes, the newlines are seen as whitespace.
Related
I just happened to be playing around with a few linux commands and i found that echo -n "100" | wc -c outputs 3. i knew that 100 could be stored in a single byte as 1100100 so i could not understand why this happened. I guess that it is because of some teminal encoding, is it ? i also found out that if i touch test.txt and echo -n "100" | test.txt and then execute wc ./test.txt -ci get the same output here also my guess is to blame file encoding, am i right ?
100 is three characters long, hence wc giving you 3. If you left out the -n to echo it'd show 4, because echo would be printing out a newline too in that case.
When you echo -n 100, you are showing a string with 3 characters.
When you want to show a character with ascii value 100, use
echo -n "d"
# Check
echo -n "d" | xdd -b
I found value "d" with man ascii. When you don't want to use the man page, use
printf "\\$(printf "%o" 100)"
# Check
printf "\\$(printf "%o" 100)" | xxd -b
# wc returns 1 here
printf "\\$(printf "%o" 100)" | wc -c
It's fine)
$ wc --help
...
-c, --bytes print the byte counts
-m, --chars print the character counts
...
$ man echo
...
-n do not output the trailing newline
...
$ echo -n 'abc' | wc -c
3
$ echo -n 'абс' | wc -c # russian symbols
6
I need to extract a substring of string in bash script.
This is the code with "echos":
echo "number:"
echo "$number"
echo "bb"
registers3=$(echo $number | grep -o -E '[0-9]+')
registers2="$(grep -oE '[0-9]+' <<< "$number")"
registers="${number//[^0-9]/}"
valor=$(grep -o "[0-9]" <<<"$number")
echo "valor:"
echo $valor
echo "reg:"
echo "$registers"
echo "reg2:"
echo "$registers2"
echo "reg3:"
echo "$registers3"
And this the output:
number:
/ > -------
420
/ >
bb
valor:
1 0 3 4 4 2 0
reg:
1034420
reg2:
1034
420
reg3:
1034
420
the problem is the special characters of $number.
can you help me to extract only the number. in this case is "421"
Thanks!!!
EDIT:
If i put $number in file ($number> file.txt) and open with vi and :set list i get:
^[[?1034h/ > -------$
420$
/ > $
Instead of this:
registers=$(echo $number | grep -o -E '[0-9]+')
try this:
registers="$(grep -oE '[0-9]+' <<< "$number")"
or even better, this one:
registers="${number//[^0-9]/}"
I try to count the number of numbers and letters in my file in Bash.
I know that I can use wc -c file to count the number of characters but how can I fix it to only letters and secondly numbers?
Here's a way completely avoiding pipes, just using tr and the shell's way to give the length of a variable with ${#variable}:
$ cat file
123 sdf
231 (3)
huh? 564
242 wr =!
$ NUMBERS=$(tr -dc '[:digit:]' < file)
$ LETTERS=$(tr -dc '[:alpha:]' < file)
$ ALNUM=$(tr -dc '[:alnum:]' < file)
$ echo ${#NUMBERS} ${#LETTERS} ${#ALNUM}
13 8 21
To count the number of letters and numbers you can combine grep with wc:
grep -o [a-z] myfile | wc -c
grep -o [0-9] myfile | wc -c
With little bit of tweaking you can modify it to count numbers or alphabetic words or alphanumeric words like this,
grep -o [a-z]+ myfile | wc -c
grep -o [0-9]+ myfile | wc -c
grep -o [[:alnum:]]+ myfile | wc -c
You can use sed to replace all characters that are not of the kind that you are looking for and then word count the characters of the result.
# 1h;1!H will place all lines into the buffer that way you can replace
# newline characters
sed -n '1h;1!H;${;g;s/[^a-zA-Z]//g;p;}' myfile | wc -c
It's easy enough to just do numbers as well.
sed -n '1h;1!H;${;g;s/[^0-9]//g;p;}' myfile | wc -c
Or why not both.
sed -n '1h;1!H;${;g;s/[^0-9a-zA-Z]//g;p;}' myfile | wc -c
There are a number of ways to approach analyzing the line, word, and character frequency of a text file in bash. Utilizing the bash builtin character case filters (e.g. [:upper:], and so on), you can drill down to the frequency of each occurrence of each character type in a text file. Below is a simple script that reads from stdin and provides the normal wc output as it first line of output, and then outputs the number of upper, lower, digits, punct and whitespace.
#!/bin/bash
declare -i lines=0
declare -i words=0
declare -i chars=0
declare -i upper=0
declare -i lower=0
declare -i digit=0
declare -i punct=0
oifs="$IFS"
# Read line with new IFS, preserve whitespace
while IFS=$'\n' read -r line; do
# parse line into words with original IFS
IFS=$oifs
set -- $line
IFS=$'\n'
# Add up lines, words, chars, upper, lower, digit
lines=$((lines + 1))
words=$((words + $#))
chars=$((chars + ${#line} + 1))
for ((i = 0; i < ${#line}; i++)); do
[[ ${line:$((i)):1} =~ [[:upper:]] ]] && ((upper++))
[[ ${line:$((i)):1} =~ [[:lower:]] ]] && ((lower++))
[[ ${line:$((i)):1} =~ [[:digit:]] ]] && ((digit++))
[[ ${line:$((i)):1} =~ [[:punct:]] ]] && ((punct++))
done
done
echo " $lines $words $chars $file"
echo " upper: $upper, lower: $lower, digit: $digit, punct: $punct, \
whitespace: $((chars-upper-lower-digit-punct))"
Test Input
$ cat dat/captnjackn.txt
This is a tale
Of Captain Jack Sparrow
A Pirate So Brave
On the Seven Seas.
(along with 2357 other pirates)
Example Use/Output
$ bash wcount3.sh <dat/captnjackn.txt
5 21 108
upper: 12, lower: 68, digit: 4, punct: 3, whitespace: 21
You can customize the script to give you as little or as much detail as you like. Let me know if you have any questions.
You can use tr to preserve only alphanumeric characters by combining the the -c (complement) and -d (delete) flags. From there on, it's just a question of some piping:
$ cat myfile.txr | tr -cd [:alnum:] | wc -c
can anybody help me with writing a multi thread shell script
Basically i have two files one file contain around >10K lines(main_file) and another contain around 200 line(sub_file). These 200 lines contain repeated string sorted of main file.I'm trying make separate files for each string to other file using below command
i have collected the string which are repeated into sub_file.
The string are present randomly in main_file.
a=0
while IFS= read -r line
do
a=$(($a+1));
users[$a]=$line
egrep "${line}" $main_file >> $line
done <"$sub_file"
if i make to use in single thread it take more time so thinking to use multithread process and complete the process in minimum time..
help me out...
The tool you need for that is gnu parallel:
parallel egrep '{}' "$mainfile" '>' '{}' < "$sub_file"
You can adjust the number of jobs processed with the option -P:
parallel -P 4 egrep '{}' "$mainfile" '>' '{}' < "$sub_file"
Please see the manual for more info.
By the way to make sure that you don't process a line twice you could make the input unique:
awk '!a[$0]++' "$sub_file" | parallel -P 4 egrep '{}' "$mainfile" '>' '{}'
NOTE: Posting from my previous post. This is not directly applicable, but very similar to tweak
I have a file 1.txt with the below contents.
-----cat 1.txt-----
1234
5678
1256
1234
1247
I have 3 more files in a folder
-----ls -lrt-------
A1.txt
A2.txt
A3.txt
The contents of those three files are similar format with different data values (All the three files are tab delimited)
-----cat A1.txt----
A X 1234 B 1234
A X 5678 B 1234
A X 1256 B 1256
-----cat A2.txt----
A Y 8888 B 1234
A Y 9999 B 1256
A X 1234 B 1256
-----cat A3.txt----
A Y 6798 C 1256
My objective is to do a search on all the A1,A2 and A3 (Only for the 3rd column of the TAB delimited file)for text in 1.txt
and the output must be redirected to the file matches.txt as given below.
Code:
/home/A1.txt:A X 1234 B 1234
/home/A1.txt:A X 5678 B 1234
/home/A1.txt:A X 1256 B 1256
/home/A2.txt:A X 1234 B 1256
The following should work.
cat A*.txt | tr -s '\t' '|' > combined.dat
{ while read myline;do
recset=`echo $myline | cut -f19 -d '|'|tr -d '\r'`
var=$(grep $recset 1.txt|wc -l)
if [[ $var -ne 0 ]]; then
echo $myline >> final.dat
fi
done } < combined.dat
{ while read myline;do
recset=`echo $myline | cut -f19 -d '|'|tr -d '\r'`
var=$(grep $recset 1.txt|wc -l)
if [[ $var -ne 0 ]]; then
echo $myline >> final2.dat
fi
done } < combined.dat
Using AWK
awk 'NR==FNR{a[$0]=1}$3 in a{print FILENAME":"$0}' 1.txt A* > matches.txt
For pipe delimited
awk –F’|’ 'NR==FNR{a[$0]=1}$3 in a{print FILENAME":"$0}' 1.txt A* > matches.txt
I want to convert binary data to hexadecimal, just that, no fancy formatting and all. hexdump seems too clever, and it "overformats" for me. I want to take x bytes from the /dev/random and pass them on as hexadecimal.
Preferably I'd like to use only standard Linux tools, so that I don't need to install it on every machine (there are many).
Perhaps use xxd:
% xxd -l 16 -p /dev/random
193f6c54814f0576bc27d51ab39081dc
Watch out!
hexdump and xxd give the results in a different endianness!
$ echo -n $'\x12\x34' | xxd -p
1234
$ echo -n $'\x12\x34' | hexdump -e '"%x"'
3412
Simply explained. Big-endian vs. little-endian :D
With od (GNU systems):
$ echo abc | od -A n -v -t x1 | tr -d ' \n'
6162630a
With hexdump (BSD systems):
$ echo abc | hexdump -ve '/1 "%02x"'
6162630a
From Hex dump, od and hexdump:
"Depending on your system type, either or both of these two utilities will be available--BSD systems deprecate od for hexdump, GNU systems the reverse."
Perhaps you could write your own small tool in C, and compile it on-the-fly:
int main (void) {
unsigned char data[1024];
size_t numread, i;
while ((numread = read(0, data, 1024)) > 0) {
for (i = 0; i < numread; i++) {
printf("%02x ", data[i]);
}
}
return 0;
}
And then feed it from the standard input:
cat /bin/ls | ./a.out
You can even embed this small C program in a shell script using the heredoc syntax.
All the solutions seem to be hard to remember or too complex. I find using printf the shortest one:
$ printf '%x\n' 256
100
But as noted in comments, this is not what author wants, so to be fair, below is the full answer.
... to use above to output actual binary data stream:
printf '%x\n' $(cat /dev/urandom | head -c 5 | od -An -vtu1)
What it does:
printf '%x\n' .... - prints a sequence of integers , i.e. printf '%x,' 1 2 3, will print 1,2,3,
$(...) - this is a way to get output of some shell command and process it
cat /dev/urandom - it outputs random binary data
head -c 5 - limits binary data to 5 bytes
od -An -vtu1 - octal dump command, converts binary to decimal
As a testcase ('a' is 61 hex, 'p' is 70 hex, ...):
$ printf '%x\n' $(echo "apple" | head -c 5 | od -An -vtu1)
61
70
70
6c
65
Or to test individual binary bytes, on input let’s give 61 decimal ('=' char) to produce binary data ('\\x%x' format does it). The above command will correctly output 3d (decimal 61):
$printf '%x\n' $(echo -ne "$(printf '\\x%x' 61)" | head -c 5 | od -An -vtu1)
3d
If you need a large stream (no newlines) you can use tr and xxd (part of Vim) for byte-by-byte conversion.
head -c1024 /dev/urandom | xxd -p | tr -d $'\n'
Or you can use hexdump (POSIX) for word-by-word conversion.
head -c1024 /dev/urandom | hexdump '-e"%x"'
Note that the difference is endianness.
dd + hexdump will also work:
dd bs=1 count=1 if=/dev/urandom 2>/dev/null | hexdump -e '"%x"'
Sometimes perl5 works better for portability if you target more than one platform. It comes with every Linux distribution and Unix OS. You can often find it in container images where other tools like xxd or hexdump are not available. Here's how to do the same thing in Perl:
$ head -c8 /dev/urandom | perl -0777 -ne 'print unpack "H*"'
5c9ed169dabf33ab
$ echo -n $'\x01\x23\xff' | perl -0777 -ne 'print unpack "H*"'
0123ff
$ echo abc | perl -0777 -ne 'print unpack "H*"'
6162630a
Note that this uses slurp more, which causes Perl to read the entire input into memory, which may be suboptimal when the input is large.
These three commands will print the same (0102030405060708090a0b0c):
n=12
echo "$a" | xxd -l "$n" -p
echo "$a" | od -N "$n" -An -tx1 | tr -d " \n" ; echo
echo "$a" | hexdump -n "$n" -e '/1 "%02x"'; echo
Given that n=12 and $a is the byte values from 1 to 26:
a="$(printf '%b' "$(printf '\\0%o' {1..26})")"
That could be used to get $n random byte values in each program:
xxd -l "$n" -p /dev/urandom
od -vN "$n" -An -tx1 /dev/urandom | tr -d " \n" ; echo
hexdump -vn "$n" -e '/1 "%02x"' /dev/urandom ; echo