Shell script to convert trim and make it single line - linux

I have a command
pdftotext -f 3 -l 3 -x 205 -y 40 -W 180 -H 75 -layout input.pdf -
When run it produces output as below
[[_थी] 2206255388
नाव मीराबाई sad
पतीचे नाव dame
| घर क्रमांक Photo's |
|वय 51 लिंग महिला Available |
I need to make each lines enclosed with double quotes and then joined to a single line separated by comma using a shell script command?

As an example, you could modify the output of your command like that:
cat <<EOF | sed 's/\(.*\)/\"\1\"/g' | tr '\n' ',' | sed 's/.$//'
> foobar
> bar
> foo
> EOF
"foobar","bar","foo"
The 1st 'sed' will add the double quotes, the 'tr' will replace the CR by a comma, last sed will remove the last comma.
So, your command will be:
pdftotext -f 3 -l 3 -x 205 -y 40 -W 180 -H 75 -layout input.pdf - | sed 's/\(.*\)/\"\1\"/g' | tr '\n' ',' | sed 's/.$//'

Related

Insert a space after the second character followed by every three characters

I need to insert a space after two characters, followed by a space after every three characters.
Data:
97100101101102101
Expected Output:
97 100 101 101 102 101
Attempted Code:
sed 's/.\{2\}/& /3g'
In two steps:
$ sed -r -e 's/^.{2}/& /' -e 's/[^ ]{3}/& /g' <<< 97100101101102101
97 100 101 101 102 101
That is:
's/^.{2}/& /'
catch the first two chars in the line and print them back with a space after.
's/[^ ]{3}/& /g'
catch three consecutive non-space characters and print them back followed by a space.
With GNU awk:
$ echo '97100101101102101' | awk '{print substr($0,1,2) gensub(/.{3}/," &","g",substr($0,3))}'
97 100 101 101 102 101
Note that unlike the currently accepted sed solution this will not add a blank char to the end of the line, e.g. using _ instead of a blank to make the issue visible:
$ echo '97100101101102101' | sed -r -e 's/^.{2}/&_/' -e 's/[^_]{3}/&_/g'
97_100_101_101_102_101_
$ echo '97100101101102101' | awk '{print substr($0,1,2) gensub(/.{3}/,"_&","g",substr($0,3))}'
97_100_101_101_102_101
and it would work even if the input contained blank chars:
$ echo '971 0101101102101' | sed -r -e 's/^.{2}/& /' -e 's/[^ ]{3}/& /g'
97 1 010 110 110 210 1
$ echo '971 0101101102101' | awk '{print substr($0,1,2) gensub(/.{3}/," &","g",substr($0,3))}'
97 1 0 101 101 102 101

wc -l is NOT counting last of the file if it does not have end of line character

I need to count all lines of an unix file. The file has 3 lines but wc -l gives only 2 count.
I understand that it is not counting last line because it does not have end of line character
Could any one please tell me how to count that line as well ?
grep -c returns the number of matching lines. Just use an empty string "" as your matching expression:
$ echo -n $'a\nb\nc' > 2or3.txt
$ cat 2or3.txt | wc -l
2
$ grep -c "" 2or3.txt
3
It is better to have all lines ending with EOL \n in Unix files. You can do:
{ cat file; echo ''; } | wc -l
Or this awk:
awk 'END{print NR}' file
This approach will give the correct line count regardless of whether the last line in the file ends with a newline or not.
awk will make sure that, in its output, each line it prints ends with a new line character. Thus, to be sure each line ends in a newline before sending the line to wc, use:
awk '1' file | wc -l
Here, we use the trivial awk program that consists solely of the number 1. awk interprets this cryptic statement to mean "print the line" which it does, being assured that a trailing newline is present.
Examples
Let us create a file with three lines, each ending with a newline, and count the lines:
$ echo -n $'a\nb\nc\n' >file
$ awk '1' f | wc -l
3
The correct number is found.
Now, let's try again with the last new line missing:
$ echo -n $'a\nb\nc' >file
$ awk '1' f | wc -l
3
This still provides the right number. awk automatically corrects for a missing newline but leaves the file alone if the last newline is present.
Respect
I respect the answer from John1024 and would like to expand upon it.
Line Count function
I find myself comparing line counts A LOT especially from the clipboard, so I have defined a bash function. I'd like to modify it to show the filenames and when passed more than 1 file a total. However, it hasn't been important enough for me to do so far.
# semicolons used because this is a condensed to 1 line in my ~/.bash_profile
function wcl(){
if [[ -z "${1:-}" ]]; then
set -- /dev/stdin "$#";
fi;
for f in "$#"; do
awk 1 "$f" | wc -l;
done;
}
Counting lines without the function
# Line count of the file
$ cat file_with_newline | wc -l
3
# Line count of the file
$ cat file_without_newline | wc -l
2
# Line count of the file unchanged by cat
$ cat file_without_newline | cat | wc -l
2
# Line count of the file changed by awk
$ cat file_without_newline | awk 1 | wc -l
3
# Line count of the file changed by only the first call to awk
$ cat file_without_newline | awk 1 | awk 1 | awk 1 | wc -l
3
# Line count of the file unchanged by awk because it ends with a newline character
$ cat file_with_newline | awk 1 | awk 1 | awk 1 | wc -l
3
Counting characters (why you don't want to put a wrapper around wc)
# Character count of the file
$ cat file_with_newline | wc -c
6
# Character count of the file unchanged by awk because it ends with a newline character
$ cat file_with_newline | awk 1 | awk 1 | awk 1 | wc -c
6
# Character count of the file
$ cat file_without_newline | wc -c
5
# Character count of the file changed by awk
$ cat file_without_newline | awk 1 | wc -c
6
Counting lines with the function
# Line count function used on stdin
$ cat file_with_newline | wcl
3
# Line count function used on stdin
$ cat file_without_newline | wcl
3
# Line count function used on filenames passed as arguments
$ wcl file_without_newline file_with_newline
3
3

Getting the total size of a directory as a number with du

Using the command du, I would like to get the total size of a directory
Output of command du myfolder:
5454 kkkkk
666 aaaaa
3456788 total
I'm able to extract the last line, but not to remmove the string total:
du -c myfolder | grep total | cut -d ' ' -f 1
Results in:
3456788 total
Desired result
3456788
I would like to have all the command in one line.
That's probably because it's tab delimited (which is the default delimiter of cut):
~$ du -c foo | grep total | cut -f1
4
~$ du -c foo | grep total | cut -d' ' -f1
4
to insert a tab, use Ctrl+v, then TAB
Alternatively, you could use awk to print the first field of the line ending with total:
~$ du -c foo | awk '/total$/{print $1}'
4
First of, you probably want to use tail -n1 instead of grep total ... Consider what happens if you have a directory named local? :-)
Now, let's look at the output of du with hexdump:
$ du -c tmp | tail -n1 | hexdump -C
00000000 31 34 30 33 34 34 4b 09 74 6f 74 61 6c 0a |140344K.total.|
That''s the character 0x09 after the K, man ascii tells us:
011 9 09 HT '\t' (horizontal tab) 111 73 49 I
It's a tab, not a space :-)
The tab character is already the default delimiter (this is specified in the POSIX spec, so you can safely rely on it), so you don't need -d at all.
So, putting that together, we end up with:
$ du -c tmp | tail -n1 | cut -f1
140344K
Why don't you use -s to summarize it? This way you don't have to grep "total", etc.
$ du .
24 ./aa/bb
...
# many lines
...
2332 .
$ du -hs .
2.3M .
Then, to get just the value, pipe to awk. This way you don't have to worry about the delimiter being a space or a tab:
du -s myfolder | awk '{print $1}'
From man du:
-h, --human-readable
print sizes in human readable format (e.g., 1K 234M 2G)
-s, --summarize
display only a total for each argument
I would suggest using awk for this:
value=$(du -c myfolder | awk '/total/{print $1}')
This simply extracts the first field of the line that matches the pattern "total".
If it is always the last line that you're interested in, an alternative would be to use this:
value=$(du -c myfolder | awk 'END{print $1}')
The values of the fields in the last line are accessible in the END block, so you can get the first field of the last line this way.

How to count the number of character in a comma separated line where commas within delimiter are not to be counted as separate?

Let's say I have the following line in my file:
HELLO,1410250216446000,1410250216470330,1410250216470367,329,PE,B,T,GALU,[ , , T, I],3.38,3,A,A, , , , ,0, ,0,0, ,-Infinity,-Infinity,-Infinity, ,,0
if I use
grep -a -w HELLO my_file | head -10 | awk -F '[\t,]' '{print NF}' | less
output is 32.
But I don't want to count the commas within []. I mean [ , , T, I] must be counted as a single word. So that the output of my query is 29.
What will be one line command for doing this in Linux?
Remove content inside brackets using sed. Then continue counting
grep -a -w HELLO my_file|sed "s/\[.*\]//g" | head -10 | awk -F '[\t,]' '{print NF}' | less
output
29

how to operate the mathematical data on the first fields and assign the varibale linux

I have a file contaning for just 2 numubers. One number on eash line.
4.1865E+02
4.1766E+02
I know its something line BHF = ($1 from line 1 - $1 from line 2 )
but can find the exact command.
How can I do a mathematical operation on them and save the result to a variable.
PS: This was got using
sed -i -e '/^$/d' nodout15
sed -i -e 's/^[ \t]*//;s/[ \t]*$//' nodout15
awk ' {print $13} ' nodout15 > 15
mv 15 nodout15
sed -i -e '/^$/d' nodout15
sed -i -e 's/^[ \t]*//;s/[ \t]*$//' nodout15
sed -n '/^[0-9]\{1\}/p' nodout15 > 15
mv 15 nodout15
tail -2 nodout15 > 15
mv 15 nodout15
After all this I have these two numbers and now I am not able to do some arithmatics. If possible please tell me a short form to do it on the spot rather doing all this jugglary. Nodout is a file with different length of columns so I am only interested in 13th column. Since all lines wont be in the daughter file so , empty lines deleted. Then only those lines to be taken starting with number. Then the last two lines, as they show the final state. The difference between them , will lead to a conditional statement. so I need to save it in a variable.
regards.
awk
$ BHF=`awk -v RS='' '{print $1-$2}' input.txt`
$ echo $BHF
0.99
bc
$ BHF=`cat input.txt | xargs printf '%f-%f\n' | bc`
$ echo $BHF
.990000

Resources