Program is about space utilisation. i am getting error : 72G value too great for base (error token is "72") [duplicate]

Program is about space utilisation. i am getting error : 72G value too great for base (error token is "72") [duplicate] - linux

Is there a shell command that simply converts back and forth between a number string in bytes and the "human-readable" number string offered by some commands via the -h option?
To clarify the question: ls -l without the -h option (some output supressed)
> ls -l
163564736 file1.bin
13209 file2.bin
gives the size in bytes, while with the -hoption (some output supressed)
> ls -lh
156M file1.bin
13K file2.bin
the size is human readable in kilobytes and megabytes.
Is there a shell command that simply turns 163564736into 156M and 13209 into 13K and also does the reverse?

numfmt
To:
echo "163564736" | numfmt --to=iec
From:
echo "156M" | numfmt --from=iec

There is no standard (cross-platform) tool to do it. But solution using awk is described here

Related

Calculate the total size of all files from a generated folders list with full PATH

I have a list containing multiple directories with the full PATH:
/mnt/directory_1/sub_directory_1/
/mnt/directory_2/
/mnt/directory_3/sub_directory_3/other_directories_3/
I need to calculated what the total size is of this list.
From Get total size of a list of files in UNIX
du -ch $file_list | tail -1 | cut -f 1
This was the closest of an answer I could find but gave me the following error message:
bash: /bin/du: Argument list too long

Do not use backticks `. Use $(..) instead.
Do not use:
command $(cat something)
this is a common anti-pattern. It works for simple cases, fails for many more, because the result of $(...) undergoes word splitting and filename expansion.
Check your scripts with http://shellcheck.net
If you want to "run a command with argument from a file" use xargs or write a loop. Read https://mywiki.wooledge.org/BashFAQ/001 . Also xargs will handle too many arguments by itself. And I would also add -s to du. Try:
xargs -d'\n' du -sch < file_list.txt | tail -1 | cut -f 1
test on repl bash

Fast string search in a very large file

What is the fastest method for searching lines in a file containing a string. I have a file containing strings to search. This small file (smallF) contains about 50,000 lines and looks like:
stringToSearch1
stringToSearch2
stringToSearch3
I have to search all of these strings in a larger file (about 100 million lines). If any line in this larger file contains the search string the line is printed.
The best method I have come up with so far is
grep -F -f smallF largeF
But this is not very fast. With just 100 search strings in smallF it takes about 4 minutes. For over 50,000 search strings it will take a lot of time.
Is there a more efficient method?

I once noticed that using -E or multiple -e parameters is faster than using -f. Note that this might not be applicable for your problem as you are searching for 50,000 string in a larger file. However I wanted to show you what can be done and what might be worth testing:
Here is what I noticed in detail:
Have 1.2GB file filled with random strings.
>ls -has | grep string
1,2G strings.txt
>head strings.txt
Mfzd0sf7RA664UVrBHK44cSQpLRKT6J0
Uk218A8GKRdAVOZLIykVc0b2RH1ayfAy
BmuCCPJaQGhFTIutGpVG86tlanW8c9Pa
etrulbGONKT3pact1SHg2ipcCr7TZ9jc
.....
Now I want to search for strings "ab", "cd" and "ef" using different grep approaches:
Using grep without flags, search one at a time:
grep "ab" strings.txt > m1.out
2,76s user 0,42s system 96% cpu 3,313 total
grep "cd" strings.txt >> m1.out
2,82s user 0,36s system 95% cpu 3,322 total
grep "ef" strings.txt >> m1.out
2,78s user 0,36s system 94% cpu 3,360 total
So in total the search takes nearly 10 seconds.
Using grep with -f flag with search strings in search.txt
>cat search.txt
ab
cd
ef
>grep -F -f search.txt strings.txt > m2.out
31,55s user 0,60s system 99% cpu 32,343 total
For some reasons this takes nearly 32 seconds.
Now using multiple search patterns with -e
grep -E "ab|cd|ef" strings.txt > m3.out
3,80s user 0,36s system 98% cpu 4,220 total
or
grep --color=auto -e "ab" -e "cd" -e "ef" strings.txt > /dev/null
3,86s user 0,38s system 98% cpu 4,323 total
The third methode using -E only took 4.22 seconds to search through the file.
Now lets check if the results are the same:
cat m1.out | sort | uniq > m1.sort
cat m3.out | sort | uniq > m3.sort
diff m1.sort m3.sort
#
The diff produces no output, which means the found results are the same.
Maybe want to give it a try, otherwise I would advise you to look at the thread "Fastest possible grep", see comment from Cyrus.

You may want to try sift or ag. Sift in particular lists some pretty impressive benchmarks versus grep.

Note: I realise the following is not a bash based solution, but given your large search space, a parallel solution is warranted.
If your machine has more than one core/processor, you could call the following function in Pythran, to parallelize the search:
#!/usr/bin/env python
#pythran export search_in_file(string, string)
def search_in_file(long_file_path, short_file_path):
_long = open(long_file_path, "r")
#omp parallel for schedule(guided)
for _string in open(short_file_path, "r"):
if _string in _long:
print(_string)
if __name__ == "__main__":
search_in_file("long_file_path", "short_file_path")
Note: Behind the scenes, Pythran takes Python code and attempt to aggressively compile it into very fast C++.

How to take advantage of filters

I've read here that
To make a pipe, put a vertical bar (|) on the command line between two commands.
then
When a program takes its input from another program, performs some operation on that input, and writes the result to the standard output, it is referred to as a filter.
So I've first tried the ls command whose output is:
Desktop HelloWord.java Templates glassfish-4.0
Documents Music Videos hs_err_pid26742.log
Downloads NetBeansProjects apache-tomcat-8.0.3 mozilla.pdf
HelloWord Pictures examples.desktop netbeans-8.0
Then ls | echo which outputs absolutely nothing.
I'm looking for a way to take advantages of pipelines and filters in my bash script. Please help.

echo doesn't read from standard input. It only writes its command-line arguments to standard output. The cat command is what you want, which takes what it reads from standard input to standard output.
ls | cat
(Note that the pipeline above is a little pointless, but does demonstrate the idea of a pipe. The command on the right-hand side must read from standard input.)
Don't confuse command-line arguments with standard input.

echo doesn't read standard input. To try something more useful, try
ls | sort -r
to get the output sorted in reverse,
or
ls | grep '[0-9]'
to only keep the lines containing digits.

In addition to what others have said - if your command (echo in this example) does not read from standard input you can use xargs to "feed" this command from standard input, so
ls | echo
doesn't work, but
ls | xargs echo
works fine.

My bash script uses so much memory

I was looking for which program is using my memory, where is leak?
And, I founded it, leak is at bash script.
But, how can it possible? Bash script will always alloc new space for each variable assignment?
My bash script is like the following, please let me know how can I correct this problem.
CONF="/conf/my.cfg"
HIGHRES="/data/high.dat"
getPeriod()
{
meas=`head -n 1 $CONF`
statperiod=`echo $meas`
}
(while true
do
lastline=`tail -n 1 $HIGHRES |cut -d"," -f2`
linenumber=`grep -n $lastline $HIGHRES | cut -f1 -d:`
/bin/stat $linenumber
getPeriod
sleep $statperiod
done)
EDIT #1:
The last line of high.dat
2013-02-11,10:59:13,1,0,0,0,0,0,0,0,0,12.340000,0.330000,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,24.730000,24.709990,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

I was unable to verify a memory leak with a close approximation of that script, so maybe the leak isn't actually where you think it is. Consider updating your question with much more info, including a complete working example along with what you did to figure out that you had a memory leak.
That said, you have chosen quite an odd way to find out how many lines a file has. The most usual way would be to use the standard wc tool:
$ wc -l < test.txt
19
$
Note: Use < file instead of passing the file name, since the latter will cause the file name to be written to stdout, and you'll then have to edit it away:
$ wc -l test.txt
19 test.txt
$

How do I list one filename per output line in Linux?

I'm using ls -a command to get the file names in a directory, but the output is in a single line.
Like this:
. .. .bash_history .ssh updater_error_log.txt
I need a built-in alternative to get filenames, each on a new line, like this:
.
..
.bash_history
.ssh
updater_error_log.txt

Use the -1 option (note this is a "one" digit, not a lowercase letter "L"), like this:
ls -1a
First, though, make sure your ls supports -1. GNU coreutils (installed on standard Linux systems) and Solaris do; but if in doubt, use man ls or ls --help or check the documentation. E.g.:
$ man ls
...
-1 list one file per line. Avoid '\n' with -q or -b

Yes, you can easily make ls output one filename per line:
ls -a | cat
Explanation: The command ls senses if the output is to a terminal or to a file or pipe and adjusts accordingly.
So, if you pipe ls -a to python it should work without any special measures.

Ls is designed for human consumption, and you should not parse its output.
In shell scripts, there are a few cases where parsing the output of ls does work is the simplest way of achieving the desired effect. Since ls might mangle non-ASCII and control characters in file names, these cases are a subset of those that do not require obtaining a file name from ls.
In python, there is absolutely no reason to invoke ls. Python has all of ls's functionality built-in. Use os.listdir to list the contents of a directory and os.stat or os to obtain file metadata. Other functions in the os modules are likely to be relevant to your problem as well.
If you're accessing remote files over ssh, a reasonably robust way of listing file names is through sftp:
echo ls -1 | sftp remote-site:dir
This prints one file name per line, and unlike the ls utility, sftp does not mangle nonprintable characters. You will still not be able to reliably list directories where a file name contains a newline, but that's rarely done (remember this as a potential security issue, not a usability issue).
In python (beware that shell metacharacters must be escapes in remote_dir):
command_line = "echo ls -1 | sftp " + remote_site + ":" + remote_dir
remote_files = os.popen(command_line).read().split("\n")
For more complex interactions, look up sftp's batch mode in the documentation.
On some systems (Linux, Mac OS X, perhaps some other unices, but definitely not Windows), a different approach is to mount a remote filesystem through ssh with sshfs, and then work locally.

you can use ls -1
ls -l will also do the work

You can also use ls -w1
This allows to set number of columns.
From manpage of ls:
-w, --width=COLS
set output width to COLS. 0 means no limit

ls | tr "" "\n"

Easy, as long as your filenames don't include newlines:
find . -maxdepth 1
If you're piping this into another command, you should probably prefer to separate your filenames by null bytes, rather than newlines, since null bytes cannot occur in a filename (but newlines may):
find . -maxdepth 1 -print0
Printing that on a terminal will probably display as one line, because null bytes are not normally printed. Some programs may need a specific option to handle null-delimited input, such as sort's -z. Your own script similarly would need to account for this.

-1 switch is the obvious way of doing it but just to mention, another option is using echo and a command substitution within a double quote which retains the white-spaces(here \n):
echo "$(ls)"
Also how ls command behaves is mentioned here:
If standard output is a terminal, the output is in columns (sorted
vertically) and control characters are output as question marks;
otherwise, the output is listed one per line and control characters
are output as-is.
Now you see why redirecting or piping outputs one per line.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string