Cygwin bash: read file word by word - linux

I want to read text file word by word. Problem: there are some words containing "/*". Such a word causes script to return files in root directory. I tried:
for word in $(< file)
do
printf "[%s]\n" "$word"
done
And several other combinations with echo/cat/etc...
For this file:
/* example file
I get following output:
[/bin]
[/cygdrive]
[/Cygwin.bat]
...
[example]
[file]
Should be easy but it's driving me nuts.

You need to turn off pathname expansion globbing. Run a new shell with bash -f and try again. See http://wiki.bash-hackers.org/syntax/expansion/globs or dive into the manpage with man bash, maybe do man bash | col -b >bash.txt.

How about this solution, the double quotes around $(< file) stop * from being expanded and sed is used format the output as required:
for word in "$(< file)"
do
echo "$word" | sed -E 's/(\S*)(\s)/[\1]\2\n/g'
done
Output:
[/*]
[example]
[file]

this may help;
# skip blank lines and comment lines begining with hash (#)
cat $CONFIG_FILE | while read LINE
do
first_char=`echo $LINE | cut -c1-1`
if [ "${first_char}" = "#" ]
then
echo "Skip line with first_char= >>${first_char}<<"
else
:
echo "process line: $LINE" ;
fi
done
Another way is to use a case statement

How about this one?
while read -a a; do printf '[%s]\n' "${a[#]}"; done < file
Output:
[/*]
[example]
[file]

Related

linux sed grep -P replace string with newline and taking next line into consideration

I have a file that was created and I need to replace the last "," with "" so it will be valid JSON. The problem is that I can't figure out how to do it with sed or even with grep/piping to something else. I am really stumped here. Any help would be appreciated.
test.json
[
{MANY OTHER RECORDS, MAKING FILE 3.5Gig (making sed fail because of memory, so newlines were added)},
{"ID":"57705e4a-158c-4d4e-9e07-94892acd98aa","USERNAME":"jmael","LOGINTIMESTAMP":"2021-11-30"},
{"ID":"b8b67609-50ed-4cdc-bbb4-622c7e6a8cd2","USERNAME":"henrydo","LOGINTIMESTAMP":"2021-12-15"},
{"ID":"a44973d0-0ec1-4252-b9e6-2fd7566c6f7d","USERNAME":"null","LOGINTIMESTAMP":"2021-10-31"},
]
Of course, using grep with -P matches what I need to replace
grep -Pzo '"},\n]' test.json
An efficient solution would be to use perl to read the last n bytes of the file, then determine the position of the superfluous comma in those bytes (for ex. with a regex) and then replace this comma with a space character:
perl -e '
$n = 16; # how many bytes to read
open $fh, "+<", $ARGV[0]; # open file in read & write mode
seek $fh, -$n, 2; # go to the end minus some bytes
$n = read $fh, $str, $n; # load the end of the file
if ( $str =~ /,\s*]\s*$/s ) { # get position of comma
seek $fh, -($n - $-[0]), 1; # go to position of comma
print $fh " "; # replace comma with space char
}
close $fh; # close file
' log.json
The strong point of this solution is that it only reads a few bytes of the file for doing the replacement => that keeps the memory consumption to almost 0 and avoids reading through the whole file.
Using GNU sed
$ sed -Ez 's/([^]]*),/\1/' test.json
[
{MANY OTHER RECORDS, MAKING FILE 3.5Gig (making sed fail because of memory, so newlines were added)},
{"ID":"57705e4a-158c-4d4e-9e07-94892acd98aa","USERNAME":"jmael","LOGINTIMESTAMP":"2021-11-30"},
{"ID":"b8b67609-50ed-4cdc-bbb4-622c7e6a8cd2","USERNAME":"henrydo","LOGINTIMESTAMP":"2021-12-15"},
{"ID":"a44973d0-0ec1-4252-b9e6-2fd7566c6f7d","USERNAME":"null","LOGINTIMESTAMP":"2021-10-31"}
]
Remove last comma in a file with GNU sed:
sed -zE 's/,([^,]*)$/\1/' file
Output to stdout:
[
{MANY OTHER RECORDS, MAKING FILE 3.5Gig (making sed fail because of memory, so newlines were added)},
{"ID":"57705e4a-158c-4d4e-9e07-94892acd98aa","USERNAME":"jmael","LOGINTIMESTAMP":"2021-11-30"},
{"ID":"b8b67609-50ed-4cdc-bbb4-622c7e6a8cd2","USERNAME":"henrydo","LOGINTIMESTAMP":"2021-12-15"},
{"ID":"a44973d0-0ec1-4252-b9e6-2fd7566c6f7d","USERNAME":"null","LOGINTIMESTAMP":"2021-10-31"}
]
See: man sed and The Stack Overflow Regular Expressions FAQ
So below is the final solution I used for this, not the prettiest but it works with no memory issues and it does what I need. Thanks to Cyrus for helping. Hope this helps someone out.
find *.json | while read file; do
_FILESIZE=$(stat -c%s "$file")
if [[ $_FILESIZE -gt 2050000000 ]] ;then
echo "${file} is too large = $(stat -c%s "${file}") bytes. will be split to work on."
#get the name of the file without extension
_FILENAME=$( echo "${file}" | sed -r "s/(.+)(\..+)/\1/" )
#Split the large file with 3 extension, 1G size, no zero byte files, numeric suffix
split -a 3 -e -d -b1G ${file} ${_FILENAME}_
#Because pipe runs in new shell you must do it this way.
_FINAL_FILE_NAME_SPLIT=
while read file_split; do
_FINAL_FILE_NAME_SPLIT=${file_split}
done < <(find ${_FILENAME}_* | sort -z)
#The last file has the change we need to make ## "null"}, \n ] ## to ## "null"} \n ] ##
sed -i -zE 's/},([^,]*)$/}\1/' ${_FINAL_FILE_NAME_SPLIT}
#Rebuild the split files to replace the final file.
cat ${_FILENAME}_* > ${file}
#Remove the split files
rm -r *_00*
else
sed -i -zE 's/},([^,]*)$/}\1/' ${file}
fi
#Check that the file is a valid json file.
cat ${file} | jq '. | length'
#view the change
tail -c 50 ${file}
echo " "
echo " "
done

how to check if a word contains all letters in a string bash

let's say I have a file containing words (one per line), and I have a string containing letters
str = "aeiou"
I want to check how many words in the file contain all the letters in string. They don't have to appear in order.
the first thing that came to mind was using cat and grep
cat wordfile | grep a | grep e | grep i | grep letters....
this seems to work, but I wonder if there's a better way.
If the search string is fixed, you might try something like that:
cat wordfile | awk '/a/&&/e/&&/i/&&/o/&&/u/' | wc -l
If needed, the search pattern may easily been build using your favorite script language. As I favor Python:
str="aeiou"
search=$(python -c 'print "/"+"/&&/".join([c for c in "'"$str"'"])+"/"')
cat wordfile | awk "$search" | wc -l
Here is a solution that is done solely in bash. Note the [[ ]] makes this non-portable to sh. This script will read every line in file and then test that it contains every character in str. The file to read must be the first argument for the script. The comments below describe the operation:
#!/bin/bash
str=aeiou
while read line || test -n "$line"; do # read every line in file
match=0; # initialize match = true
for ((i=0; i<${#str}; i++)); do # for each letter in string
[[ $line =~ ${str:$i:1} ]] || { # test it is contained in line - or
match=1 # set match false and
break # break - goto next word
}
done
# if match still true, then all letters in string found in line
test "$match" -eq 0 && echo "all found in '$line'";
done < "$1"
exit 0
testfile (dat/vowels.txt):
a_even_ice_dough_ball
a_even_ice_ball
someword
notallvowels
output:
$ bash vowel.sh dat/vowels.txt
all found in 'a_even_ice_dough_ball'
Messy, but can be done in one step by turning on the PCRE-regex flag of GNU grep
grep -P '^(?=.*a.*)(?=.*e.*)(?=.*i.*)(?=.*o.*)(?=.*u.*)' file | wc -l

Count number of words in file, bash script

How could i go printing the number of words in a specified file in a bash script. For example it will be run as
cat test | ./bash_script.sh
cat test
Hello World
This is a test
Output of running cat test | ./bash_script would look like
Word count: 6.
I am aware that it can be done without a script. I am trying to implement wc -w into a bash script that will count the words like shown above. Any help is appreciated! Thank You
if given a stream of input as shown:
while read -a words; do (( num += ${#words[#]} )); done
echo Word count: $num.
Extending from the link #FredrikPihl gave in a comment: this reads from each file given as an argument or from stdin if no files given:
for f in "${#:-/dev/stdin}"; do
while read -a words; do (( num += ${#words[#]} )); done < "$f"
done
echo Word count: $num.
this should be faster:
for f in "${#:-/dev/stdin}"; do
words=( $(< "$f") )
(( num += ${#words[#]} ))
done
echo Word count: $num.
in pure bash:
read -a arr -d $'\004'
echo ${#arr[#]}
Try this:
wc -w *.md | grep total | awk '{print $1}'
#!/bin/bash
word_count=$(wc -w)
echo "Word count: $word_count."
As pointed by #keshlam in the comments, this can be easily done by executing wc -w from the shell script, I didn't understand what could be its use case.
Although, the above shell script will work as per your requirement. Below is a test output.
I believe what you need is a function that you could add to your bashrc:
function script1() { wc -w $1; }
script1 README.md
335 README.md
You can add the function to your .bash_rc file and call it what you want upon next console or if you source your .bashrc file then it will load in the function ... from then on you can call function name like you see with file and it will give you count
You could expand the contents of the file as arguments and echo the number of arguments in the script.
$# Expands to the number of script arguments
#!/bin/bash
echo "Word count: $#."
Then execute:
./bash_script.sh $(cat file)

script not reading last line of a file

i have a file created in windows using notepad:
26453215432460
23543265235421
38654365876325
12354152435243
I have a script which will read every line, and create a command like below in other file for every line and will not consider blank lines:
CRE:EQU,264532154324600,432460,1;
Now if I save my input file after hitting enter after the last line of number 12354152435243, then the output file consists the command above corresponding to all numbers(including the last 12354152435243:
CRE:EQU,264532154324600,432460,1;
CRE:EQU,235432652354210,235421,1;
CRE:EQU,386543658763250,876325,1;
CRE:EQU,123541524352430,435243,1;
but if I save the file, without hitting enter after the last number is keyed in i.e after this 12354152435243, then after the script executes, I don't see the output file have the command for the last number:
CRE:EQU,264532154324600,432460,1;
CRE:EQU,235432652354210,235421,1;
CRE:EQU,386543658763250,876325,1;
Can somebody explain the error in the code:
while read LINE
do
[ -z "$LINE" ] && continue
IMEI=`echo $LINE | sed 's/ //g' | sed -e 's/[^ -~]//g'`
END_SERIAL=`echo $IMEI | cut -c9- | sed 's/ //g' | sed -e 's/[^ -~]//g'`
echo "CRE:EQU,${IMEI}0,${END_SERIAL},${list},,${TODAY};" >> /apps/ins/list.out
done < "${FILE_NAME}"
kindly help
Use
grep . "${FILE_NAME}" | while read LINE
or
while read LINE
do
....
done < <(grep . "${FILE_NAME}")
The grep is less sensible to line-ending, and you will get empty-line skip for a free... :)
Honestly, never tried windows, all above is OK for unix...
EDIT Explanation:
make the next file:
echo -n -e 'line\n\nanother\nno line ending here>' >file.txt
the file contains 4 lines (although the last "line" is not a "correct" one)
line
another
no line ending here>
Usual shell routines, as read or wc looking for line ending. Therefore,
$ wc -l file.txt
3 file.txt
When you grepping for '' (empty string) the grep returns every line where found the string, so
$ grep '' file.txt
prints
line
another
no line ending here>
When grep prints out the found lines - ensures than one `\n' exists at the end, so
$ grep '' file.txt | wc -l
returns
4
therefore, for these situations, is better to use grep with -c (count) and not wc.
$ grep -c '' file.txt
4
Now, the . dot. The dot mean any character. So, when you grepping for a ., you get all lines what contain at least one character. And therefore, it will skip all lines what doesn't contain any character = skips empty lines. So,
$ grep . file.txt
line
another
no line ending here>
again, with added line ending to the last line (and skipped the empty line). Remember, the (space) is character too, so when the line contains only one space it is NOT EMPTY. Counting non-empty lines
$ grep . file.txt | wc -l
3
or faster
$ grep -c . file.txt
3
If you do a help read it says for -d delim continue until the first character of DELIM is read, rather than newline.
So read will continue until it hits a \n or if you specify -d delim.
So you probably need to change the delim or you can try read -e
read will read untill a new line has been found and when it finds a new line, it will return you the line. But if the file ends without a new line, read treats this as an error. So, even if read has set the returning variable with the line read till now, the return code of read is set to indicate an error. Now the while read ... this loop body will only executes if the command executes with a success which is not a case here. Thus you miss the last line.
For overcoming this , you can change the condition to also check the returning variable is empty or not. Hence the condition succeeds even if read fails, as the variable is already set till the end of the file.
This is not related to line ending in different OS, i mean it's somehow related, but the exact root cause is always the read fails to find a new line at the end of the line/file, and the last line is missing the loop body.
Below is an example
[[bash_prompt$]]$ echo -ne 'hello\nthere' > log
[[bash_prompt$]]$ while read line; do echo $line; done < log
hello
[[bash_prompt$]]$ while read line || [ -n "$line" ]; do echo $line; done < log
hello
there
[[bash_prompt$]]$
read nees the end of line to read the input. Try
echo -n $'a\nb' | while read x ; do echo $x ; done
It only prints a.
To prevent script not reading last line of a file:
cat "somefile" | { cat ; echo ; } | while read line; do echo $line; done
Source : My open source project https://sourceforge.net/projects/command-output-to-html-table/

How to split a list by comma not space

I want to split a text with comma , not space in for foo in list. Suppose I have a CSV file CSV_File with following text inside it:
Hello,World,Questions,Answers,bash shell,script
...
I used following code to split it into several words:
for word in $(cat CSV_File | sed -n 1'p' | tr ',' '\n')
do echo $word
done
It prints:
Hello
World
Questions
Answers
bash
shell
script
But I want it to split the text by commas not spaces:
Hello
World
Questions
Answers
bash shell
script
How can I achieve this in bash?
Set IFS to ,:
sorin#sorin:~$ IFS=',' ;for i in `echo "Hello,World,Questions,Answers,bash shell,script"`; do echo $i; done
Hello
World
Questions
Answers
bash shell
script
sorin#sorin:~$
Using a subshell substitution to parse the words undoes all the work you are doing to put spaces together.
Try instead:
cat CSV_file | sed -n 1'p' | tr ',' '\n' | while read word; do
echo $word
done
That also increases parallelism. Using a subshell as in your question forces the entire subshell process to finish before you can start iterating over the answers. Piping to a subshell (as in my answer) lets them work in parallel. This matters only if you have many lines in the file, of course.
I think the canonical method is:
while IFS=, read field1 field2 field3 field4 field5 field6; do
do stuff
done < CSV.file
If you don't know or don't care about how many fields there are:
IFS=,
while read line; do
# split into an array
field=( $line )
for word in "${field[#]}"; do echo "$word"; done
# or use the positional parameters
set -- $line
for word in "$#"; do echo "$word"; done
done < CSV.file
kent$ echo "Hello,World,Questions,Answers,bash shell,script"|awk -F, '{for (i=1;i<=NF;i++)print $i}'
Hello
World
Questions
Answers
bash shell
script
Create a bash function
split_on_commas() {
local IFS=,
local WORD_LIST=($1)
for word in "${WORD_LIST[#]}"; do
echo "$word"
done
}
split_on_commas "this,is a,list" | while read item; do
# Custom logic goes here
echo Item: ${item}
done
... this generates the following output:
Item: this
Item: is a
Item: list
(Note, this answer has been updated according to some feedback)
Read: http://linuxmanpages.com/man1/sh.1.php
& http://www.gnu.org/s/hello/manual/autoconf/Special-Shell-Variables.html
IFS The Internal Field Separator that is used for word splitting
after expansion and to split lines into words with the read
builtin command. The default value is ``''.
IFS is a shell environment variable so it will remain unchanged within the context of your Shell script but not otherwise, unless you EXPORT it. ALSO BE AWARE, that IFS will not likely be inherited from your Environment at all: see this gnu post for the reasons and more info on IFS.
You're code written like this:
IFS=","
for word in $(cat tmptest | sed -n 1'p' | tr ',' '\n'); do echo $word; done;
should work, I tested it on command line.
sh-3.2#IFS=","
sh-3.2#for word in $(cat tmptest | sed -n 1'p' | tr ',' '\n'); do echo $word; done;
World
Questions
Answers
bash shell
script
You can use:
cat f.csv | sed 's/,/ /g' | awk '{print $1 " / " $4}'
or
echo "Hello,World,Questions,Answers,bash shell,script" | sed 's/,/ /g' | awk '{print $1 " / " $4}'
This is the part that replace comma with space
sed 's/,/ /g'
For me, use array split is simpler ref
IN="bla#some.com;john#home.com"
arrIN=(${IN//;/ })
echo ${arrIN[1]}
Using readarray(mapfile):
$ cat csf
Hello,World,Questions,Answers,bash shell,script
$ readarray -td, arr < csf
$ printf '%s\n' "${arr[#]}"
Hello
World
Questions
Answers
bash shell
script

Resources