How to read a file and write in to another file using a shell script - linux

I have a file which looks like this (file1.txt)
258.2222
I have to write this file1.txt value to another file. if there in no value in file1.txt then
it should print as "Passed".
this is what I tried
for final in $(cat file1.txt);do
if [ "$final" ];then
echo $final > file2.txt
else
echo "Passed" > file2.txt
fi
done
this only works with 1 scenario. if there is no value in file1.txt then it is not writing as "Passed"
expected output:
if there is a value in file1.txt:
258.2222
if there is no value (empty) in file1.txt:
Passed
Can someone help me to figure out this? Thanks in advance!
Note: I am not allowed to use general purpose scripting language (JavaScript, Python etc).

If file1.txt contains something (its file size is non-zero), you want that something to go into file2.txt:
if [ -s file1.txt ]; then
cp file1.txt file2.txt
fi
If file1.txt is empty (or missing), then you want a line saying Passed written to file2.txt:
if [ -s file1.txt ]; then
cp file1.txt file2.txt
else
echo Passed >file2.txt
fi
If you want to avoid creating file2.txt if file1.txt is missing:
if [ -s file1.txt ]; then
cp file1.txt file2.txt
elif [ -e file1.txt ]; then
echo Passed >file2.txt
fi
The echo is now only executed if the -s test fails (the file does not exist, or it is empty) and -e test succeeds (the file exists).
At no point do you actually have to read the data from the file in a loop.

The line for final in $(cat file1.txt); do is always of questionable value. It will munge whitespace, and is generally a bad idea. Sometimes it has a place, but for the most part there are better ways to iterate over the contents of a file. In your case, this is causing an issue because when the file is empty the loop is executed zero times.
It's not clear to me if you are trying to parse the value in some way, or filter it from other content. If you just want to copy the full contents of the file or write "Passed" if the file is empty, you can do:
if test -s file1.txt; then
cat file1.txt
else
echo Passed
fi > file2.txt
or
if ! grep '[^ ]' file1.txt; then echo Passed; fi > file2.txt
The test command in the first case will succeed (return 0) if file1.txt exists and has some content. Note that this may simply be whitespace; if the file having non-zero size is not sufficient for your needs, you may want to use the second solution where grep only succeeds if it matches at least one line that has a non-space character (which may be a tab!). (But note that this grep will filter out lines that are only space!). If the test or the grep returns non-zero (eg, the file is empty or there are no lines that contain non-whitespace), then "Passed" is written. Either way, the output is redirected to file2.txt.

that is mean file1.txt is empty
Then I would harness GNU AWK for this task following way
awk 'END{print NR?$0:"Passed"}' file1.txt > file2.txt
Explanation: I use so called ternary operator condition?valueiftrue:valueiffalse for condition I use just NR (number row) which inside END is total number of rows processed, if file1.txt has one or more rows I print $0 which inside END is last line of file, otherwise i.e. where there is 0 rows in file I print text Passed.
(tested in gawk 4.2.1)

Using any awk:
awk '{print} END{if (!NR) print "Passed"}' file1.txt > file2.txt

Related

bash/awk/unix detect changes in lines of csv files

I have a timestamp in this format:
(normal_file.csv)
timestamp
19/02/2002
19/02/2002
19/02/2002
19/02/2002
19/02/2002
19/02/2002
The dates are usually uniform, however, there are files with irregular dates pattern such as this example:
(abnormal_file.csv)
timestamp
19/02/2002
19/02/2003
19/02/2005
19/02/2006
In my directory, there are hundreds of files that consist of normal.csv and abnormal.csv.
I want to write a bash or awk script that detect the dates pattern in all files of a directory. Files with abnormal.csv should be moved automatically to a new, separate directory (let's say dir_different/).
Currently, I have tried the following:
#!/bin/bash
mkdir dir_different
for FILE in *.csv;
do
# pipe 1: detect the changes in the line
# pipe 2: print the timestamp column (first column, columns are comma-separated)
awk '$1 != prev {print ; prev = $1}' < $FILE | awk -F , '{print $1}'
done
If the timestamp in a given file is normal, then only one single timestamp will be printed; but for abnormal files, multiple dates will be printed.
I am not sure how to separate the abnormal files from the normal files, and I have tried the following:
do
output=$(awk 'FNR==3{print $0}' $FILE)
echo ${output}
if [[ ${output} =~ ([[:space:]]) ]]
then
mv $FILE dir_different/
fi
done
Or is there an easier method to detect changes in lines and separate files that have different lines? Thank you for any suggestions :)
Assuming that none of your "normal" CSV files have trailing newlines this should do the separation just fine:
#!/bin/bash
mkdir -p dir_different
for FILE in *.csv;
do
if awk '{a[$1]++}END{if(length(a)<=2){exit 1}}' "$FILE" ; then
echo mv "$FILE" dir_different
fi
done
After a dry-run just get rid of the echo :)
Edit:
{a[$1]++} This bit creates an array a that gets the first field of each line as an index, and that gets incremented every time the same value is seen.
END{if(length(a)<=2){exit 1}} This checks how many elements are in the array. If there there are less than 3 (which should be the case if there's always the same date and we only get 1 header, 1 date) exit the processing with 1.
"$FILE" is part of the bash script, not awk, and I quoted your variable out of habit, should you ever have files w/ spaces in their names you'll see why :)
So, a "normal" file contains only two different lines:
timestamp
dd/mm/yyyy
Testing if a file is normal is thus as simple as:
[ $(sort -u file.csv | wc -l) -eq 2 ]
This leads to the following possible solution:
#!/usr/bin/env bash
mkdir -p dir_different
for FILE in *.csv;
do
if [ $(sort -u "$FILE" | wc -l) -ne 2 ] ; then
echo mv "$FILE" dir_different
fi
done

Linux : Move files that have more than 100 commas in one line

I have 100 files in a specific directory that contains several records with fields delimited with commas.
I need to use a Linux command that check the lines in each file
and if the line contains more than 100 comma move it to another directory.
Is it possible ?
Updated Answer
Although my original answer below is functional, Glenn's (#glennjackman) suggestion in the comments is far more concise, idiomatic, eloquent and preferable - as follows:
#!/bin/bash
mkdir subdir
for f in file*; do
awk -F, 'NF>100{exit 1}' "$f" || mv "$f" subdir
done
It basically relies on awk's exit status generally being 0, and then only setting it to 1 when encountering files that need moving.
Original Answer
This will tell you if a file has more than 100 commas on any line:
awk -F, 'NF>100{print 1;exit} END{print 0}' someFile
It will print 1 and exit without parsing the remainder of the file if the file has any line with more than 100, and print 0 at the end if it doesn't.
If you want to move them as well, use
#!/bin/bash
mkdir subdir
for f in file*; do
if [[ $(awk -F, 'NF>100{print 1;exit}END{print 0}' "$f") != "0" ]]; then
echo mv "$f" subdir
fi
done
Try this and see if it selects the correct files, and, if you like it, remove the word echo and run it again so it actually moves them. Back up first!

Prepend data from one file to another

How do I prepend the data from file1.txt to file2.txt?
The following command will take the two files and merge them into one
cat file1.txt file2.txt > file3.txt; mv file3.txt file2.txt
You can do this in a pipeline using sponge from moreutils:
cat file1.txt file2.txt | sponge file2.txt
Another way using GNU sed:
sed -i -e '1rfile1.txt' -e '1{h;d}' -e '2{x;G}' file2.txt
That is:
On line 1, append the content of the file file1.txt
On line 1, copy pattern space to hold space, and delete pattern space
On line 2, exchange the content of the hold and pattern spaces, and append the hold space to pattern space
The reason it's a bit tricky is that the r command appends content,
and line 0 is not addressable, so we have to do it on line 1,
moving the content of the original line out of the way and then bringing it back after the content of the file is appended.
If it's available on your system, then sponge from moreutils is designed for this. Here is an example:
cat file1.txt file2.txt | sponge file2.txt
If you don't have sponge, then the following script does the same job using a temporary file. It makes sure that the temporary file is not accessible by other users, and cleans it up at the end.
If your system, or the script crashes, you may need to clean up the temporary file manually. Tested on Bash 4.4.23, and Debian 10 (Buster) Gnu/Linux.
#!/bin/bash
#
# ----------------------------------------------------------------------------------------------------------------------
# usage [ from, to ]
# [ from, to ]
# ----------------------------------------------------------------------------------------------------------------------
# Purpose:
# Prepend the contents of file [from], to file [to], leaving the result in file [to].
# ----------------------------------------------------------------------------------------------------------------------
# check
[[ $# -ne 2 ]] && echo "[exit]: two filenames are required" >&2 && exit 1
# init
from="$1"
to="$2"
tmp_fn=$( mktemp -t TEMP_FILE_prepend.XXXXXXXX )
chmod 600 "$tmp_fn"
# prepend
cat "$from" "$to" > "$tmp_fn"
mv "$tmp_fn" "$to"
# cleanup
rm -f "$tmp_fn"
# [End]
The way of writing file is like 1). append at the end of the file or 2). rewrite that file.
If you want to put the content in file1.txt ahead of file2.txt, I'm afraid you need to rewrite the combined fine.

Quick unix command to display specific lines in the middle of a file?

Trying to debug an issue with a server and my only log file is a 20GB log file (with no timestamps even! Why do people use System.out.println() as logging? In production?!)
Using grep, I've found an area of the file that I'd like to take a look at, line 347340107.
Other than doing something like
head -<$LINENUM + 10> filename | tail -20
... which would require head to read through the first 347 million lines of the log file, is there a quick and easy command that would dump lines 347340100 - 347340200 (for example) to the console?
update I totally forgot that grep can print the context around a match ... this works well. Thanks!
I found two other solutions if you know the line number but nothing else (no grep possible):
Assuming you need lines 20 to 40,
sed -n '20,40p;41q' file_name
or
awk 'FNR>=20 && FNR<=40' file_name
When using sed it is more efficient to quit processing after having printed the last line than continue processing until the end of the file. This is especially important in the case of large files and printing lines at the beginning. In order to do so, the sed command above introduces the instruction 41q in order to stop processing after line 41 because in the example we are interested in lines 20-40 only. You will need to change the 41 to whatever the last line you are interested in is, plus one.
# print line number 52
sed -n '52p' # method 1
sed '52!d' # method 2
sed '52q;d' # method 3, efficient on large files
method 3 efficient on large files
fastest way to display specific lines
with GNU-grep you could just say
grep --context=10 ...
No there isn't, files are not line-addressable.
There is no constant-time way to find the start of line n in a text file. You must stream through the file and count newlines.
Use the simplest/fastest tool you have to do the job. To me, using head makes much more sense than grep, since the latter is way more complicated. I'm not saying "grep is slow", it really isn't, but I would be surprised if it's faster than head for this case. That'd be a bug in head, basically.
What about:
tail -n +347340107 filename | head -n 100
I didn't test it, but I think that would work.
I prefer just going into less and
typing 50% to goto halfway the file,
43210G to go to line 43210
:43210 to do the same
and stuff like that.
Even better: hit v to start editing (in vim, of course!), at that location. Now, note that vim has the same key bindings!
You can use the ex command, a standard Unix editor (part of Vim now), e.g.
display a single line (e.g. 2nd one):
ex +2p -scq file.txt
corresponding sed syntax: sed -n '2p' file.txt
range of lines (e.g. 2-5 lines):
ex +2,5p -scq file.txt
sed syntax: sed -n '2,5p' file.txt
from the given line till the end (e.g. 5th to the end of the file):
ex +5,p -scq file.txt
sed syntax: sed -n '2,$p' file.txt
multiple line ranges (e.g. 2-4 and 6-8 lines):
ex +2,4p +6,8p -scq file.txt
sed syntax: sed -n '2,4p;6,8p' file.txt
Above commands can be tested with the following test file:
seq 1 20 > file.txt
Explanation:
+ or -c followed by the command - execute the (vi/vim) command after file has been read,
-s - silent mode, also uses current terminal as a default output,
q followed by -c is the command to quit editor (add ! to do force quit, e.g. -scq!).
I'd first split the file into few smaller ones like this
$ split --lines=50000 /path/to/large/file /path/to/output/file/prefix
and then grep on the resulting files.
If your line number is 100 to read
head -100 filename | tail -1
Get ack
Ubuntu/Debian install:
$ sudo apt-get install ack-grep
Then run:
$ ack --lines=$START-$END filename
Example:
$ ack --lines=10-20 filename
From $ man ack:
--lines=NUM
Only print line NUM of each file. Multiple lines can be given with multiple --lines options or as a comma separated list (--lines=3,5,7). --lines=4-7 also works.
The lines are always output in ascending order, no matter the order given on the command line.
sed will need to read the data too to count the lines.
The only way a shortcut would be possible would there to be context/order in the file to operate on. For example if there were log lines prepended with a fixed width time/date etc.
you could use the look unix utility to binary search through the files for particular dates/times
Use
x=`cat -n <file> | grep <match> | awk '{print $1}'`
Here you will get the line number where the match occurred.
Now you can use the following command to print 100 lines
awk -v var="$x" 'NR>=var && NR<=var+100{print}' <file>
or you can use "sed" as well
sed -n "${x},${x+100}p" <file>
With sed -e '1,N d; M q' you'll print lines N+1 through M. This is probably a bit better then grep -C as it doesn't try to match lines to a pattern.
Building on Sklivvz' answer, here's a nice function one can put in a .bash_aliases file. It is efficient on huge files when printing stuff from the front of the file.
function middle()
{
startidx=$1
len=$2
endidx=$(($startidx+$len))
filename=$3
awk "FNR>=${startidx} && FNR<=${endidx} { print NR\" \"\$0 }; FNR>${endidx} { print \"END HERE\"; exit }" $filename
}
To display a line from a <textfile> by its <line#>, just do this:
perl -wne 'print if $. == <line#>' <textfile>
If you want a more powerful way to show a range of lines with regular expressions -- I won't say why grep is a bad idea for doing this, it should be fairly obvious -- this simple expression will show you your range in a single pass which is what you want when dealing with ~20GB text files:
perl -wne 'print if m/<regex1>/ .. m/<regex2>/' <filename>
(tip: if your regex has / in it, use something like m!<regex>! instead)
This would print out <filename> starting with the line that matches <regex1> up until (and including) the line that matches <regex2>.
It doesn't take a wizard to see how a few tweaks can make it even more powerful.
Last thing: perl, since it is a mature language, has many hidden enhancements to favor speed and performance. With this in mind, it makes it the obvious choice for such an operation since it was originally developed for handling large log files, text, databases, etc.
print line 5
sed -n '5p' file.txt
sed '5q' file.txt
print everything else than line 5
`sed '5d' file.txt
and my creation using google
#!/bin/bash
#removeline.sh
#remove deleting it comes move line xD
usage() { # Function: Print a help message.
echo "Usage: $0 -l LINENUMBER -i INPUTFILE [ -o OUTPUTFILE ]"
echo "line is removed from INPUTFILE"
echo "line is appended to OUTPUTFILE"
}
exit_abnormal() { # Function: Exit with error.
usage
exit 1
}
while getopts l:i:o:b flag
do
case "${flag}" in
l) line=${OPTARG};;
i) input=${OPTARG};;
o) output=${OPTARG};;
esac
done
if [ -f tmp ]; then
echo "Temp file:tmp exist. delete it yourself :)"
exit
fi
if [ -f "$input" ]; then
re_isanum='^[0-9]+$'
if ! [[ $line =~ $re_isanum ]] ; then
echo "Error: LINENUMBER must be a positive, whole number."
exit 1
elif [ $line -eq "0" ]; then
echo "Error: LINENUMBER must be greater than zero."
exit_abnormal
fi
if [ ! -z $output ]; then
sed -n "${line}p" $input >> $output
fi
if [ ! -z $input ]; then
# remove this sed command and this comes move line to other file
sed "${line}d" $input > tmp && cp tmp $input
fi
fi
if [ -f tmp ]; then
rm tmp
fi
You could try this command:
egrep -n "*" <filename> | egrep "<line number>"
Easy with perl! If you want to get line 1, 3 and 5 from a file, say /etc/passwd:
perl -e 'while(<>){if(++$l~~[1,3,5]){print}}' < /etc/passwd
I am surprised only one other answer (by Ramana Reddy) suggested to add line numbers to the output. The following searches for the required line number and colours the output.
file=FILE
lineno=LINENO
wb="107"; bf="30;1"; rb="101"; yb="103"
cat -n ${file} | { GREP_COLORS="se=${wb};${bf}:cx=${wb};${bf}:ms=${rb};${bf}:sl=${yb};${bf}" grep --color -C 10 "^[[:space:]]\\+${lineno}[[:space:]]"; }

Problem with Bash output redirection [duplicate]

This question already has answers here:
Why doesnt "tail" work to truncate log files?
(6 answers)
Closed 1 year ago.
I was trying to remove all the lines of a file except the last line but the following command did not work, although file.txt is not empty.
$cat file.txt |tail -1 > file.txt
$cat file.txt
Why is it so?
Redirecting from a file through a pipeline back to the same file is unsafe; if file.txt is overwritten by the shell when setting up the last stage of the pipeline before tail starts reading off the first stage, you end up with empty output.
Do the following instead:
tail -1 file.txt >file.txt.new && mv file.txt.new file.txt
...well, actually, don't do that in production code; particularly if you're in a security-sensitive environment and running as root, the following is more appropriate:
tempfile="$(mktemp file.txt.XXXXXX)"
chown --reference=file.txt -- "$tempfile"
chmod --reference=file.txt -- "$tempfile"
tail -1 file.txt >"$tempfile" && mv -- "$tempfile" file.txt
Another approach (avoiding temporary files, unless <<< implicitly creates them on your platform) is the following:
lastline="$(tail -1 file.txt)"; cat >file.txt <<<"$lastline"
(The above implementation is bash-specific, but works in cases where echo does not -- such as when the last line contains "--version", for instance).
Finally, one can use sponge from moreutils:
tail -1 file.txt | sponge file.txt
You can use sed to delete all lines but the last from a file:
sed -i '$!d' file
-i tells sed to replace the file in place; otherwise, the result would write to STDOUT.
$ is the address that matches the last line of the file.
d is the delete command. In this case, it is negated by !, so all lines not matching the address will be deleted.
Before 'cat' gets executed, Bash has already opened 'file.txt' for writing, clearing out its contents.
In general, don't write to files you're reading from in the same statement. This can be worked around by writing to a different file, as above:$cat file.txt | tail -1 >anotherfile.txt
$mv anotherfile.txt file.txtor by using a utility like sponge from moreutils:$cat file.txt | tail -1 | sponge file.txt
This works because sponge waits until its input stream has ended before opening its output file.
When you submit your command string to bash, it does the following:
Creates an I/O pipe.
Starts "/usr/bin/tail -1", reading from the pipe, and writing to file.txt.
Starts "/usr/bin/cat file.txt", writing to the pipe.
By the time 'cat' starts reading, 'file.txt' has already been truncated by 'tail'.
That's all part of the design of Unix and the shell environment, and goes back all the way to the original Bourne shell. 'Tis a feature, not a bug.
tmp=$(tail -1 file.txt); echo $tmp > file.txt;
This works nicely in a Linux shell:
replace_with_filter() {
local filename="$1"; shift
local dd_output byte_count filter_status dd_status
dd_output=$("$#" <"$filename" | dd conv=notrunc of="$filename" 2>&1; echo "${PIPESTATUS[#]}")
{ read; read; read -r byte_count _; read filter_status dd_status; } <<<"$dd_output"
(( filter_status > 0 )) && return "$filter_status"
(( dd_status > 0 )) && return "$dd_status"
dd bs=1 seek="$byte_count" if=/dev/null of="$filename"
}
replace_with_filter file.txt tail -1
dd's "notrunc" option is used to write the filtered contents back, in place, while dd is needed again (with a byte count) to actually truncate the file. If the new file size is greater or equal to the old file size, the second dd invocation is not necessary.
The advantages of this over a file copy method are: 1) no additional disk space necessary, 2) faster performance on large files, and 3) pure shell (other than dd).
As Lewis Baumstark says, it doesn't like it that you're writing to the same filename.
This is because the shell opens up "file.txt" and truncates it to do the redirection before "cat file.txt" is run. So, you have to
tail -1 file.txt > file2.txt; mv file2.txt file.txt
echo "$(tail -1 file.txt)" > file.txt
Just for this case it's possible to use cat < file.txt | (rm file.txt; tail -1 > file.txt)
That will open "file.txt" just before connection "cat" with subshell in "(...)". "rm file.txt" will remove reference from disk before subshell will open it for write for "tail", but contents will be still available through opened descriptor which is passed to "cat" until it will close stdin. So you'd better be sure that this command will finish or contents of "file.txt" will be lost
It seems to not like the fact you're writing it back to the same filename. If you do the following it works:
$cat file.txt | tail -1 > anotherfile.txt
tail -1 > file.txt will overwrite your file, causing cat to read an empty file because the re-write will happen before any of the commands in your pipeline are executed.

Resources