How to create dynamic headers on a text file using BASH script - linux

I have 5 big text files in a directory with millions of records delimited by pipe. All I want to do is, when I run the BASH script it should create a header on the first line like this:
TCR1|A|B|C|D|E|F|# of records
and the first word(TCR) is the new name of the file and last one is the number of records. Both of them should change with respect to each text file. So, when I run the script once, it should find the 5 text files in the directory and script as mentioned above. The output should look like this in each text file.
a.txt
TCR1|A|B|C|D|E|F|# of records in first text file
b.txt
TCR2|A|B|C|D|E|F|# of records in second text file
c.txt
TCR3|A|B|C|D|E|F|# of records in third text file
d.txt
TCR4|A|B|C|D|E|F|# of records in fourth text file
e.txt
TCR5|A|B|C|D|E|F|# of records in fifth text file

I think this is probably what you mean, though your question is very poorly posed:
#!/bin/bash
# Don't crash if no text files present and allow upper/lowercase "txt/TXT"
shopt -s nullglob nocaseglob
# Declare "lines" to be numeric, rather than string
declare -i lines
for f in *.txt; do
lines=$(wc -l < "$f")
echo "$f|A|B|C|D|E|F|$lines"
cat "$f"
done
I don't understand the TCR thing, but maybe this is what you want:
#!/bin/bash
# Declare "lines" to be numeric, rather than string
declare -i lines
for f in *.txt; do
lines=$(wc -l < "$f")
TCRthing="unknown"
[ "$f" == "a.txt" ] && TCRthing="TCR1"
[ "$f" == "b.txt" ] && TCRthing="TCR2"
[ "$f" == "c.txt" ] && TCRthing="TCR3"
[ "$f" == "d.txt" ] && TCRthing="TCR4"
[ "$f" == "e.txt" ] && TCRthing="TCR5"
echo "$TCRthing|A|B|C|D|E|F|$lines"
cat "$f"
done
Note that there are simpler, more idiomatic ways of doing this, for example, you could just run:
more *.txt
and then press CtrlG to get status as to which file you are viewing and where you have reached and how many lines each file is. You can also press :n to move to the next file and :p to move to the previous file. And 1G to go back to top of current file and G to go to bottom of current file.

Related

bash/awk/unix detect changes in lines of csv files

I have a timestamp in this format:
(normal_file.csv)
timestamp
19/02/2002
19/02/2002
19/02/2002
19/02/2002
19/02/2002
19/02/2002
The dates are usually uniform, however, there are files with irregular dates pattern such as this example:
(abnormal_file.csv)
timestamp
19/02/2002
19/02/2003
19/02/2005
19/02/2006
In my directory, there are hundreds of files that consist of normal.csv and abnormal.csv.
I want to write a bash or awk script that detect the dates pattern in all files of a directory. Files with abnormal.csv should be moved automatically to a new, separate directory (let's say dir_different/).
Currently, I have tried the following:
#!/bin/bash
mkdir dir_different
for FILE in *.csv;
do
# pipe 1: detect the changes in the line
# pipe 2: print the timestamp column (first column, columns are comma-separated)
awk '$1 != prev {print ; prev = $1}' < $FILE | awk -F , '{print $1}'
done
If the timestamp in a given file is normal, then only one single timestamp will be printed; but for abnormal files, multiple dates will be printed.
I am not sure how to separate the abnormal files from the normal files, and I have tried the following:
do
output=$(awk 'FNR==3{print $0}' $FILE)
echo ${output}
if [[ ${output} =~ ([[:space:]]) ]]
then
mv $FILE dir_different/
fi
done
Or is there an easier method to detect changes in lines and separate files that have different lines? Thank you for any suggestions :)
Assuming that none of your "normal" CSV files have trailing newlines this should do the separation just fine:
#!/bin/bash
mkdir -p dir_different
for FILE in *.csv;
do
if awk '{a[$1]++}END{if(length(a)<=2){exit 1}}' "$FILE" ; then
echo mv "$FILE" dir_different
fi
done
After a dry-run just get rid of the echo :)
Edit:
{a[$1]++} This bit creates an array a that gets the first field of each line as an index, and that gets incremented every time the same value is seen.
END{if(length(a)<=2){exit 1}} This checks how many elements are in the array. If there there are less than 3 (which should be the case if there's always the same date and we only get 1 header, 1 date) exit the processing with 1.
"$FILE" is part of the bash script, not awk, and I quoted your variable out of habit, should you ever have files w/ spaces in their names you'll see why :)
So, a "normal" file contains only two different lines:
timestamp
dd/mm/yyyy
Testing if a file is normal is thus as simple as:
[ $(sort -u file.csv | wc -l) -eq 2 ]
This leads to the following possible solution:
#!/usr/bin/env bash
mkdir -p dir_different
for FILE in *.csv;
do
if [ $(sort -u "$FILE" | wc -l) -ne 2 ] ; then
echo mv "$FILE" dir_different
fi
done

List directories and their files grouping them on one line for tokenization

I want to group the directory name with their files in bash script.
For example if I type ls /home/maindir/*
I get home/maindir/dir1: file1 file2\n file3
home/maindir/dir2: file1 file2
The directories with files are not separated by a specified delimiter because there are cases that file1 and file2, in the same directory, have a newline beetween them, so I want to tokenize with a delimiter the directory name and its file list all on one line.
Example output with newline delimiter:
home/maindir/dir1: file1 file2 file3\n
home/maindir/dir2: file1 file2\n
home/maindir/dir3: file1 file2 file4\n
I originally used an unquoted interpolation trick.
For example, if you have strings in a file, one per line, and you want them horizontalized, you don't have to use paste -
file named foo:
a
b
c
then you can say:
echo $(<foo)
and you get
a b c
But that could cause issues with filenames, especially if they have embedded special chars or whitespace.
Thanks to Gordon Davisson for a simple upgrade!
for d in /home/maindir/* # includes full path each time
do [[ -d "$d" ]] || continue # ignore nondirectories
cd "$d" # go there to make filenames path-bare
echo "$d:" *
done
Note that this still includes subdirectories. Do you need to skip those?
If you want to be more careful -
for d in /charter/apps/*
do [[ -d "$d" ]] || continue
cd "$d"
dir="$d: "
hit=0
for f in *
do if [[ -f "$f" ]]
then hit=1
dir="$dir $f "
fi
done
(( $hit )) && printf "$dir\n"
done
This one should also work on files with embedded spaces &c.

How can we increment a string variable within a for loop

#! /bin/bash
for i in $(ls);
do
j=1
echo "$i"
not expected Output:-
autodeploy
bin
config
console-ext
edit.lok
need Output like below if give input 2 it should print "bin" based on below condition, but I want out put like Directory list
1.)autodeploy
2.)bin
3.)config
4.)console-ext
5.)edit.lok
and if i like as input:- 2 then it should print "bin"
Per BashFAQ #1, a while read loop is the correct way to read content line-by-line:
#!/usr/bin/env bash
enumerate() {
local line i
i=0
while IFS= read -r line; do
((++i))
printf '%d.) %s\n' "$i" "$line"
done
}
ls | enumerate
However, ls is not an appropriate tool for programmatic use; the above is acceptable if the results of ls are only for human consumption, but not if they're going to be parsed by a machine -- see Why you shouldn't parse the output of ls(1).
If you want to list files and let the user choose among them by number, pass the results of a glob expression to select:
select filename in *; do
echo "$filename" && break
done
I don't understand what you mean in your question by like Directory list, but following your example, you do not need to write a loop:
ls|nl -s '.)' -w 1
If you want to avoid ls, you can do the following (but be careful - this only works if the directory entries do not contain white spaces (because this would make fmt to break them into two lines):
echo *|fmt -w 1 |nl -s '.)' -w 1

Linux : Move files that have more than 100 commas in one line

I have 100 files in a specific directory that contains several records with fields delimited with commas.
I need to use a Linux command that check the lines in each file
and if the line contains more than 100 comma move it to another directory.
Is it possible ?
Updated Answer
Although my original answer below is functional, Glenn's (#glennjackman) suggestion in the comments is far more concise, idiomatic, eloquent and preferable - as follows:
#!/bin/bash
mkdir subdir
for f in file*; do
awk -F, 'NF>100{exit 1}' "$f" || mv "$f" subdir
done
It basically relies on awk's exit status generally being 0, and then only setting it to 1 when encountering files that need moving.
Original Answer
This will tell you if a file has more than 100 commas on any line:
awk -F, 'NF>100{print 1;exit} END{print 0}' someFile
It will print 1 and exit without parsing the remainder of the file if the file has any line with more than 100, and print 0 at the end if it doesn't.
If you want to move them as well, use
#!/bin/bash
mkdir subdir
for f in file*; do
if [[ $(awk -F, 'NF>100{print 1;exit}END{print 0}' "$f") != "0" ]]; then
echo mv "$f" subdir
fi
done
Try this and see if it selects the correct files, and, if you like it, remove the word echo and run it again so it actually moves them. Back up first!

using linux cmds to match a list names against a bunch of files

I have a slight problem with matching file names.
I have one text file that contains some 160 names. I have a folder with 2000+ files. and some of them contains these 160 names. I am looking for a grep cmd that can take each name in the text file and try to match it to the contents of the folder.
I am trying to do this in perl, or just straight forward linux cmds, but neither has worked out very well for me because I am not familiar with either of them.
so for example: the text file contains
abc acc eee fff
and the folder will have abcXXX, accXXX, eeeXXX and fffXXX
I need to sort through the list and find out which one were missing.
thx
Davy
If you search in the content of the files :
#!/bin/sh
for i in `cat files`
do
grep -R $i folder --color
done
and if you search in the filename of the files :
#!/bin/sh
for i in `cat files`
do
find . -name $i*
done
for file in $(< list);
do
[ ! -f ${file}xxx ] && echo "x: " ${file}xxx
done
list is the file, containing the list of filenames "abc acc ...".
< is redirection - so we read from the file 'list', the same as $(cat list). If the file isn't named 'list', just replace the name.
file is declared in that statement and iteratively bound to all those entries in 'list'. Later it is used as ${file}.
[ ! -f ${file}xxx ] is a test, whether a -f(ile) exists, for abc it searchs for a file abcxxx.
But ! negates the search, so if no such file exists, then echo ... is called. "x: " is just a debug relict.
We can improve that part:
for file in $(< list);
do
[ -f ${file}xxx ] || echo ${file}xxx
done
instead of 'NOT x AND y' we can write 'x OR y' - the meaning is the same, just shorter: the file does exist or echo its name.
|| is short-circuit OR.
If you can arrange for the text file to have each name on a separate line then the following command will do what you need:
ls myfolder | grep -f mytextfile
One way to get each name on a separate line in the text file would be to edit a copy in vi and issue the commands:
:%s/ /^V^M/g
:wq
(^V^M means type "control-v" then "control-m")

Resources