how to extract numbers in the same location from many log files - linux

I got an file test1.log
04/15/2016 02:22:46 PM - kneaddata.knead_data - INFO: Running kneaddata v0.5.1
04/15/2016 02:22:46 PM - kneaddata.utilities - INFO: Decompressing gzipped file ...
Input Reads: 69766650 Surviving: 55798391 (79.98%) Dropped: 13968259 (20.02%)
TrimmomaticSE: Completed successfully
04/15/2016 02:32:04 PM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /home/liaoming/kneaddata_v0.5.1/WGC066610D/WGC066610D_kneaddata.trimmed.fastq
04/15/2016 05:32:31 PM - kneaddata.utilities - DEBUG: 55798391 reads; of these:
55798391 (100.00%) were unpaired; of these:
55775635 (99.96%) aligned 0 times
17313 (0.03%) aligned exactly 1 time
5443 (0.01%) aligned >1 times
0.04% overall alignment rate
and the other files in the same format but different contents,like test2.log,test3.log to test60.log
I would like to extract two numbers from these files.For example the test1.log, the two numbers would be 55798391 55775635.
So the final generated file counts.txt would be something like this:
test1 55798391 55775635
test2 51000000 40000000
.....
test60 5000000 30000000

awk to the rescue!
$ awk 'FNR==9{f=$1} FNR==10{print FILENAME,f,$1}' test{1..60}.log
if not in the same directory, either call within a loop or create the file list and pipe to xargs awk
$ for i in {1..60}; do awk ... test$i/test$i.log; done
$ for i in {1..60}; do echo test$i/test$i.log; done | xargs awk ...

Related

changing file creation date by subtracting 4:00 hours

I have a file in c:/users/text.txt
This text.txt have a file creation date, how to get that text.txt creation date and subtract -4:00 using shell script
last_update=$(stat -c "%n %y" $file)
this statement is giving me the file creation date. How can i subtract -4:00 from it?
for suppose lets say text.txt file was created at 04/04/2019 4:00, i want to change that to 04/04/2019 12:00.
%y is not creation but last modification date.
Get the date as seconds since Epoch and subtract 4 hours from it, convert it to human readable form using date, change access and modification times of a file using touch:
$ stat file
File: file
Size: 0 Blocks: 0 IO Block: 4096 regular empty file
Device: b31ch/45852d Inode: 65386 Links: 1
Access: (0600/-rw-------) Uid: (10138/ u0_a138) Gid: (10138/ u0_a138)
Access: 2019-04-04 12:34:56.172954982 +0300
Modify: 2019-04-04 12:34:56.172954982 +0300
Change: 2019-04-04 12:34:56.172954982 +0300
Birth: -
$
$ stat -c '%y' file
2019-04-04 12:34:56.172954982 +0300
$ stat -c '%Y' file
1554370496
$ date -d #$(( $(stat -c '%Y' file) - 4*60*60 ))
Thu Apr 4 08:34:56 +03 2019
$ touch -d "$( date -d #$(( $(stat -c '%Y' file) - 4*60*60 )) )" file
$
$ stat file
File: file
Size: 0 Blocks: 0 IO Block: 4096 regular empty file
Device: b31ch/45852d Inode: 65386 Links: 1
Access: (0600/-rw-------) Uid: (10138/ u0_a138) Gid: (10138/ u0_a138)
Access: 2019-04-04 08:34:56.000000000 +0300
Modify: 2019-04-04 08:34:56.000000000 +0300
Change: 2019-04-04 12:37:14.492954929 +0300
Birth: -
See stat(1), date(1), and touch(1) for further information.

time command output on an already running process

I have a process that spawns some other processes,
I want to use the time command on a specific process and get the same output as the time command.
Is that possible and how?
I want to use the time command on a specific process and get the same output as the time command.
Probably it is enough just to use pidstat to get user and sys time:
$ pidstat -p 30122 1 4
Linux 2.6.32-431.el6.x86_64 (hostname) 05/15/2014 _x86_64_ (8 CPU)
04:42:28 PM PID %usr %system %guest %CPU CPU Command
04:42:29 PM 30122 706.00 16.00 0.00 722.00 3 has_serverd
04:42:30 PM 30122 714.00 12.00 0.00 726.00 3 has_serverd
04:42:31 PM 30122 714.00 14.00 0.00 728.00 3 has_serverd
04:42:32 PM 30122 708.00 16.00 0.00 724.00 3 has_serverd
Average: 30122 710.50 14.50 0.00 725.00 - has_serverd
If not then according to strace time uses wait4 system call (http://linux.die.net/man/2/wait4) to get information about a process from the kernel. The same info returns getrusage but you cannot call it for an arbitrary process according to its documentation (http://linux.die.net/man/2/getrusage).
So, I do not know any command that will give the same output. However it is feasible to create a bash script that gets PID of the specific process and outputs something like time outpus then
This script does these steps:
1) Get the number of clock ticks per second
getconf CLK_TCK
I assume it is 100 and 1 tick is equal to 10 milliseconds.
2) Then in loop do the same sequence of commands while exists the directory /proc/YOUR-PID:
while [ -e "/proc/YOUR-PID" ];
do
read USER_TIME SYS_TIME REAL_TIME <<< $(cat /proc/PID/stat | awk '{print $14, $15, $22;}')
sleep 0.1
end loop
Some explanation - according to man proc :
user time: ($14) - utime - Amount of time that this process has been scheduled in user mode, measured in clock ticks
sys time: ($15) - stime - Amount of time that this process has been scheduled in kernel mode, measured in clock ticks
starttime ($22) - The time in jiffies the process started after system boot.
3) When the process is finished get finish time
read FINISH_TIME <<< $(cat '/proc/self/stat' | awk '{print $22;}')
And then output:
the real time = ($FINISH_TIME-$REAL_TIME) * 10 - in milliseconds
user time: ($USER_TIME/(getconf CLK_TCK)) * 10 - in milliseconds
sys time: ($SYS_TIME/(getconf CLK_TCK)) * 10 - in milliseconds
I think it should give roughly the same result as time. One possible problem I see is if the process exists for a very short period of time.
This is my implementation of time:
#!/bin/bash
# Uses herestrings
print_res_jeffies()
{
let "TIME_M=$2/60000"
let "TIME_S=($2-$TIME_M*60000)/1000"
let "TIME_MS=$2-$TIME_M*60000-$TIME_S*1000"
printf "%s\t%dm%d.%03dms\n" $1 $TIME_M $TIME_S $TIME_MS
}
print_res_ticks()
{
let "TIME_M=$2/6000"
let "TIME_S=($2-$TIME_M*6000)/100"
let "TIME_MS=($2-$TIME_M*6000-$TIME_S*100)*10"
printf "%s\t%dm%d.%03dms\n" $1 $TIME_M $TIME_S $TIME_MS
}
if [ $(getconf CLK_TCK) != 100 ]; then
exit 1;
fi
if [ $# != 1 ]; then
exit 1;
fi
PROC_DIR="/proc/"$1
if [ ! -e $PROC_DIR ]; then
exit 1
fi
USER_TIME=0
SYS_TIME=0
START_TIME=0
while [ -e $PROC_DIR ]; do
read TMP_USER_TIME TMP_SYS_TIME TMP_START_TIME <<< $(cat $PROC_DIR/stat | awk '{print $14, $15, $22;}')
if [ -e $PROC_DIR ]; then
USER_TIME=$TMP_USER_TIME
SYS_TIME=$TMP_SYS_TIME
START_TIME=$TMP_START_TIME
sleep 0.1
else
break
fi
done
read FINISH_TIME <<< $(cat '/proc/self/stat' | awk '{print $22;}')
let "REAL_TIME=($FINISH_TIME - $START_TIME)*10"
print_res_jeffies 'real' $REAL_TIME
print_res_ticks 'user' $USER_TIME
print_res_ticks 'sys' $SYS_TIME
And this is an example that compares my implementation of time and real time:
>time ./sys_intensive > /dev/null
Alarm clock
real 0m10.004s
user 0m9.883s
sys 0m0.034s
In another terminal window I run my_time.sh and give it PID:
>./my_time.sh `pidof sys_intensive`
real 0m10.010ms
user 0m9.780ms
sys 0m0.030ms

Extract values from a text file in linux

I have a log file generated from sqlldr log file and I was wondering if I can write a shell to extract the following values from the log below using Linux. Thanks
Table_name: TEST
Row_load: 100
Row_fail: 10
Date_run: Feb 07, 2014
Table TEST:
100 Rows successfully loaded.
10 Rows not loaded due to data errors.
0 Rows not loaded because all WHEN clauses were failed.
0 Rows not loaded because all fields were null.
Bind array size not used in direct path.
Column array rows : 5000
Stream buffer bytes: 256000
Read buffer bytes: 1048576
Total logical records skipped: 0
Total logical records read: 14486
Total logical records rejected: 0
Total logical records discarded: 0
Total stream buffers loaded by SQL*Loader main thread: 3
Total stream buffers loaded by SQL*Loader load thread: 0
Run began on Fri Feb 07 12:21:24 2014
Run ended on Fri Feb 07 12:21:31 2014
Elapsed time was: 00:00:06.88
CPU time was: 00:00:00.11
If the structure of your log file is always same, then you can do the following with awk:
awk '
NR==1 { sub(/:/,x); print "Table_name: "$NF}
NR==2 { print "Row_load: " $1}
NR==3 { print "Row_fail: " $1}
/Run ended/ { print "Date_run: "$5 FS $6","$8}' file
Output
$ awk '
NR==1 { sub(/:/,x); print "Table_name: "$NF}
NR==2 { print "Row_load: " $1}
NR==3 { print "Row_fail: " $1}
/Run ended/ { print "Date_run: "$5 FS $6","$8}' file
Table_name: TEST
Row_load: 100
Row_fail: 10
Date_run: Feb 07,2014

Linux differences between consecutive lines

I need to loop trough n lines of a file and for any i between 1 and n - 1 to get the difference line(n - 1) - line(n).
And here is the source file:
root#syncro:/var/www# cat cron.log | grep "/dev/vda"
/dev/vda 20418M 14799M 4595M 77% /
/dev/vda 20418M 14822M 4572M 77% /
/dev/vda 20418M 14846M 4548M 77% /
/dev/vda 20418M 14867M 4527M 77% /
/dev/vda 20418M 14888M 4506M 77% /
/dev/vda 20418M 14910M 4484M 77% /
/dev/vda 20418M 14935M 4459M 78% /
/dev/vda 20418M 14953M 4441M 78% /
/dev/vda 20418M 14974M 4420M 78% /
/dev/vda 20418M 15017M 4377M 78% /
/dev/vda 20418M 15038M 4356M 78% /
root#syncro:/var/www# cat cron.log | grep "/dev/vda" | cut -b 36-42 | tr -d " M"
4595
4572
4548
4527
4506
4484
4459
4441
4420
4377
4356
those /dev/vda... lines are logged hourly with df -BM in cron.log file and the difference between lines will reveal the hourly disk consumption.
So, the expected output will be:
23 (4595 - 4572)
24 (4572 - 4548)
...
43 (4420 - 4377)
21 (4377 - 4356)
I don't need the text between ( and ), I put it here for explanation only.
I'm not sure if I got you correctly, but the following awk script should work:
awk '{if(NR>1){print _n-$4};_n=$4}' your.file
Output:
23
24
21
21
22
25
18
21
43
21
You don't need the other programs in the pipe. Just:
awk '/\/dev\/vda/ {if(c++>0){print _n-$4};_n=$4}' src/checkout-plugin/a.txt
will be enough. The regex on start of the awk scripts tells awk to apply the following block only to lines which match the pattern. A side effect is that NR can't be used anymore to detect the "second line" in which the calculation starts. I introduced a custome counter c for that purpose.
Also note that awk will remove the M on it's own, because the column has been used in a numeric calculation.

How to get file creation date/time in Bash/Debian?

I'm using Bash on Debian GNU/Linux 6.0. Is it possible to get the file creation date/time? Not the modification date/time.
ls -lh a.txt and stat -c %y a.txt both only give the modification time.
Unfortunately your quest won't be possible in general, as there are only 3 distinct time values stored for each of your files as defined by the POSIX standard (see Base Definitions section 4.8 File Times Update)
Each file has three distinct associated timestamps: the time of last
data access, the time of last data modification, and the time the file
status last changed. These values are returned in the file
characteristics structure struct stat, as described in <sys/stat.h>.
EDIT: As mentioned in the comments below, depending on the filesystem used metadata may contain file creation date. Note however storage of information like that is non standard. Depending on it may lead to portability problems moving to another filesystem, in case the one actually used somehow stores it anyways.
ls -i file #output is for me 68551981
debugfs -R 'stat <68551981>' /dev/sda3 # /dev/sda3 is the disk on which the file exists
#results - crtime value
[root#loft9156 ~]# debugfs -R 'stat <68551981>' /dev/sda3
debugfs 1.41.12 (17-May-2010)
Inode: 68551981 Type: regular Mode: 0644 Flags: 0x80000
Generation: 769802755 Version: 0x00000000:00000001
User: 0 Group: 0 Size: 38973440
File ACL: 0 Directory ACL: 0
Links: 1 Blockcount: 76128
Fragment: Address: 0 Number: 0 Size: 0
ctime: 0x526931d7:1697cce0 -- Thu Oct 24 16:42:31 2013
atime: 0x52691f4d:7694eda4 -- Thu Oct 24 15:23:25 2013
mtime: 0x526931d7:1697cce0 -- Thu Oct 24 16:42:31 2013
**crtime: 0x52691f4d:7694eda4 -- Thu Oct 24 15:23:25 2013**
Size of extra inode fields: 28
EXTENTS:
(0-511): 352633728-352634239, (512-1023): 352634368-352634879, (1024-2047): 288392192-288393215, (2048-4095): 355803136-355805183, (4096-6143): 357941248-357943295, (6144
-9514): 357961728-357965098
mikyra's answer is good. The fact just like what he said.
[jason#rh5 test]$ stat test.txt
File: `test.txt'
Size: 0 Blocks: 8 IO Block: 4096 regular empty file
Device: 802h/2050d Inode: 588720 Links: 1
Access: (0664/-rw-rw-r--) Uid: ( 500/ jason) Gid: ( 500/ jason)
Access: 2013-03-14 01:58:12.000000000 -0700
Modify: 2013-03-14 01:58:12.000000000 -0700
Change: 2013-03-14 01:58:12.000000000 -0700
if you want to verify, which file was created first, you can structure your file name by appending system date when you create a series of files.
Note that if you've got your filesystem mounted with noatime for performance reasons, then the atime will likely show the creation time. Given that noatime results in a massive performance boost (by removing a disk write for every time a file is read), it may be a sensible configuration option that also gives you the results you want.
Creation date/time is normally not stored. So no, you can't.
You can find creation time - aka birth time - using stat and also match using find.
We have these files showing last modified time:
$ ls -l --time-style=long-iso | sort -k6
total 692
-rwxrwx---+ 1 XXXX XXXX 249159 2013-05-31 14:47 Getting Started.pdf
-rwxrwx---+ 1 XXXX XXXX 275799 2013-12-30 21:12 TheScienceofGettingRich.pdf
-rwxrwx---+ 1 XXXX XXXX 25600 2015-05-07 18:52 Thumbs.db
-rwxrwx---+ 1 XXXX XXXX 148051 2015-05-07 18:55 AsAManThinketh.pdf
To find files created within a certain time frame using find as below.
Clearly, the filesystem knows about the birth time of a file:
$ find -newerBt '2014-06-13' ! -newerBt '2014-06-13 12:16:10' -ls
20547673299906851 148 -rwxrwx--- 1 XXXX XXXX 148051 May 7 18:55 ./AsAManThinketh.pdf
1407374883582246 244 -rwxrwx--- 1 XXXX XXXX 249159 May 31 2013 ./Getting\ Started.pdf
We can confirm this using stat:
$ stat -c "%w %n" * | sort
2014-06-13 12:16:03.873778400 +0100 AsAManThinketh.pdf
2014-06-13 12:16:04.006872500 +0100 Getting Started.pdf
2014-06-13 12:16:29.607075500 +0100 TheScienceofGettingRich.pdf
2015-05-07 18:32:26.938446200 +0100 Thumbs.db
stat man pages explains %w:
%w time of file birth, human-readable; - if unknown
ls -i menus.xml
94490 menus.xml
Here the number 94490 represents inode
Then do a:
df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg-root 4.0G 3.4G 408M 90% /
tmpfs 1.9G 0 1.9G 0% /dev/shm
/dev/sda1 124M 27M 92M 23% /boot
/dev/mapper/vg-var 7.9G 1.1G 6.5G 15% /var
To find the mounting point of the root "/" filesystem, because the file menus.xml is on '/' that is '/dev/mapper/vg-root'
debugfs -R 'stat <94490>' /dev/mapper/vg-root
The output may be like the one below:
debugfs -R 'stat <94490>' /dev/mapper/vg-root
debugfs 1.41.12 (17-May-2010)
Inode: 94490 Type: regular Mode: 0644 Flags: 0x0
Generation: 2826123170 Version: 0x00000000
User: 0 Group: 0 Size: 4441
File ACL: 0 Directory ACL: 0
Links: 1 Blockcount: 16
Fragment: Address: 0 Number: 0 Size: 0
ctime: 0x5266e438 -- Wed Oct 23 09:46:48 2013
atime: 0x5266e47b -- Wed Oct 23 09:47:55 2013
mtime: 0x5266e438 -- Wed Oct 23 09:46:48 2013
Size of extra inode fields: 4
Extended attributes stored in inode body:
selinux = "unconfined_u:object_r:usr_t:s0\000" (31)
BLOCKS:
(0-1):375818-375819
TOTAL: 2
Where you can see the creation time:
ctime: 0x5266e438 -- Wed Oct 23 09:46:48 2013
stat -c %w a.txt
%w returns the file creation(birth) date if it is available, which is rare.
Here's the link
As #mikyra explained, creation date time is not stored anywhere.
All the methods above are nice, but if you want to quickly get only last modify date, you can type:
ls -lit /path
with -t option you list all file in /path odered by last modify date.
If you really want to achieve that you can use a file watcher like inotifywait.
You watch a directory and you save information about file creations in separate file outside that directory.
while true; do
change=$(inotifywait -e close_write,moved_to,create .)
change=${change#./ * }
if [ "$change" = ".*" ]; then ./scriptToStoreInfoAboutFile; fi
done
As no creation time is stored, you can build your own system based on inotify.
Cited from https://unix.stackexchange.com/questions/50177/birth-is-empty-on-ext4/131347#131347 , the following shellscript would work to get creation time:
get_crtime() {
for target in "${#}"; do
inode=$(stat -c %i "${target}")
fs=$(df "${target}" | tail -1 | awk '{print $1}')
crtime=$(sudo debugfs -R 'stat <'"${inode}"'>' "${fs}" 2>/dev/null | grep -oP 'crtime.*--\s*\K.*')
printf "%s\t%s\n" "${target}" "${crtime}"
done
}
even better:
lsct ()
{
debugfs -R 'stat <'`ls -i "$1" | (read a b;echo -n $a)`'>' `df "$1" | (read a; read a b; echo "$a")` 2> /dev/null | grep --color=auto crtime | ( read a b c d;
echo $d )
}
lsct /etc
Wed Jul 20 19:25:48 2016
Another trick to add to your arsenal is the following:
$ grep -r "Copyright" /<path-to-source-files>/src
Generally speaking, if one changes a file they should claim credit in the “Copyright”. Examine the results for dates, file names, contributors and contact email.
example grep result:
/<path>/src/someobject.h: * Copyright 2007-2012 <creator's name> <creator's email>(at)<some URL>>

Resources