How not to process files which were already processed before? (they will not be zipped or renamed)

How not to process files which were already processed before? (they will not be zipped or renamed) - linux

I have read-only access to the folder containing lot of logs with names starting with SystemOut*:
SystemOut_15.03.12_1215124.log
SystemOut_15.03.12_23624.log
SystemOut_15.03.02_845645.log
SystemOut_15.03.14_745665.log
SystemOut_15.03.16_456457.log
SystemOut_15.03.07_474574.log
The logs are not zipped or renamed.
What I need to implement is to parse them in such a way that the logs already processed will not be processed again. Also, the mandatory condition is not to process the log with the latest modification date&time.
I would potentially think I need to create a separate file on a location I have write access with the log names my script has already processed?
Grateful if you could provide some suggestions and how to implement them.
Thanks

I agree keeping track of the logs you have already processed in a separate file is a good idea. It's not clear from your question how you will identify the current log, so I leave that in your court.
Try something like this:
mysavedfiles=/some/path/file.txt
curfile=$(ls -tr | tail -n 1)
for fn in logfiles/*.log
do
if ! grep -q $fn $mysavedfiles && [ "$fn" != "$curfile" ]
then
... process it ...
echo $fn >>$mysavedfiles
fi
done
You could also exclude the last file by changing to a while read loop fed by some processing:
ls -tr logfile/*.log | head -n -1 | while read fn
do
....
done

Related

Execute p4 aliased command result in messed wrong order of lines in output

I have the following content inside my p4aliases.txt.
diff-cl $(target-cl) = diff -dl //...#$(EQ)$(target-cl)
Basically it diffs against your files in current workspace toward the target shelved files of changelist.
It is fine. I can execute it. But when I compare the result coming from above aliased command against the direct raw (non-aliased) command as follows
p4 diff -dl //...#=<target-cl>
the output lines of text from aliased command is in wrong order e.g. changes according to a certain file shows up first before a line of file shown, line orders are messed up. This is not the case if you execute with a non-aliased command.
Example
Expected result
==== //depot/common.h#none - x:\mydir\project\src\common.h ====
==== //depot/file.cpp#none - x:\mydir\project\src\file.cpp ====
3a4
> added line 1
==== //depot/file.h#none - x:\mydir\project\src\file.h ====
Actual result
3a4
> added line 1
==== //depot/common.h#none - x:\mydir\project\src\common.h ====
==== //depot/file.cpp#none - x:\mydir\project\src\file.cpp ====
==== //depot/file.h#none - x:\mydir\project\src\file.h ====
I have p4 version as of Rev. P4/NTX64/2021.1/2126753 (2021/05/12).
Perforce server version (got from p4 info) is Server version: P4D/LINUX26X86_64/2017.1/1574018 (2017/10/02).
How can I solve this issue?
Could this be a version too far away between client and server
Update
I have tested p4 client all the way down from 2016-2020 version by downloading old binaries from ftp.perforce.com (in directory perforce). No luck. Output still messed the same. So it's not the problem about version mismatch.

This looks like a bug in the p4 client. When the client does a diff, it's written by the ClientUser::Diff() method, which defaults to writing to stdout (i.e. it does not route the output through ClientUser::OutputText()):
https://workshop.perforce.com/projects/perforce_software-p4/files/2018-2/client/clientuser.cc#436
https://workshop.perforce.com/projects/perforce_software-p4/files/2018-2/client/clientuser.cc#573
Output from commands run as part of an alias go through the ClientUserStrBuf subclass, which buffers all of its output. The file headers, for example, are buffered by ClientUserStrBuf::OutputInfo():
https://workshop.perforce.com/projects/perforce_software-p4/files/2018-2/client/clientaliases.cc#1647
There isn't a ClientUserStrBuf::Diff() implementation, though, so that diff output goes straight to stdout while the headers are buffered and printed at the end (presumably after some post-processing) -- hence the diff output showing up first in the console.
The fix I'd make would be to have the base ClientUser::Diff() implementation route the output through OutputText() when no output file is provided, which seems like the least-surprise behavior; that'd fix the aliases behavior and might even make life a little easier for other client developers who would otherwise hit the same issue. If you have a support contract with Perforce you can file this as a bug report, or since the client is open source you can take a crack at fixing and building it yourself. I don't think there's a workaround that doesn't involve modifying the client source code.

Samwise has the correct approach to truly fix the problem at hands although it might take some effort to understand the code, and conduct the fix itself.
At any rate, if we took such approach we won't be able to take benefits of bug fixes and future updates as we will be stuck with 2018-2 version of p4 as it's the latest as it can be in which we can grab the source.
I would recommend to use WSL then interact with p4.exe (yes, a Windows-based binary) for Windows-based project, and p4 for Linux-based binary. If you didn't use WSL, please find the .bash_aliases-like solution as I have below to seamlessly solve aliases diff operation.
Put the following code into your ~/.bash_aliases
# p4 - fix of aliases diff operation
# platform independent, it will choose a correct binary path to execute properly
p4() { cmd="p4.exe" # default is Windows-based
# get the last argument value, if "-lx" passed in then we know it's linux
if [[ "${#: -1}" == "-lx" ]]; then
cmd="/usr/local/bin/p4"
fi
if [[ $1 == "diff-cl" ]]; then
if [ -z "$2" ]; then
echo "usage: p4 diff-cl <CL>"
return 1
fi
$cmd diff -dl //...#=$2 | diffp4 | less -r
elif [[ $1 == "diff-cl-fonly" ]]; then
if [ -z "$2" ]; then
echo "usage: p4 diff-cl-fonly <CL>"
return 1
fi
$cmd diff -Od -dl -ds //...#=$2 | diffp4 | grep ==== | less -r
else
$cmd "$#"
fi
}
then source ~/.bash_aliases.
What it does is to allow you to still use p4 with all of its original commands & arguments normally with exceptions of diff-cl (which is the same name of alias I've put into p4aliases.txt for Windows or ~/.p4aliases for Linux). You can safely remove diff-cl entry from p4's alias file, or just leave it there. What we have in ~/.bash_aliases file will intercept whenever such argument matches then execute the raw command, just that we don't have to type long command ourselves.
We can later remove such section in our ~/.bash_aliases file when upstream p4 has been fixed.
In else section, we just relay the the whole arguments, and it will be performed just as normally done.
Extra: diff-cl-fonly to list out only files (its depot path, and local workspace path) which have changes.

balancing the bash calculations

We have a tool for cutting adaptors https://github.com/vsbuffalo/scythe/blob/master/README.md and we wanted it to be used on all the files in the raw folder and make an output of each file separately as OUT+File Name.
Something is wrong with this script I wrote, because it doesn't take each file separately, and the whole thing doesn't work properly. It's gonna generateing empty file named OUT+files
Expected operation will looks:
take file1, use scythe on it, write output as OUTfile1
take file2 etc.
#!/bin/bash
FILES=/home/dave/raw/*
for f in $FILES
do
echo "Processing the $f file..."
/home/deve/scythe/scythe -a /home/dev/scythe/illumina_adapters.fa -o "OUT"+$f $f
done
Additionally, I noticed (testing for a single file) that the script uses only one core out of 130 available. Is there any way to improve it?

There is no string concatenation operator in shell. Use juxtaposition instead; it's "OUT$f", not "OUT"+$f.

One liner to append a file into another file but only if it hasn't already been added

I have an automated process that has a number of lines like the following pattern:
sudo cat /some/path/to/a/file >> /some/other/file
I'd like to transform that into a one liner that will only append to /some/other/file if /some/path/to/a/file has not already been added.
Edit
It's clear I need some examples here.
example 1: Updating a .bashrc script for a specific login
example 2: Creating a .screenrc for different logins
example 3: Appending to the end of a /etc/ config file
Some other caveats. The text is going to be added in a block (>>). Consequently, it should be relatively straight forward to see if the entire code block is added or not near the end of a file. I am trying to come up with a simple method for determining whether or not the file has already been appended to the original.
Thanks!
Example python script...
def check_for_appended(new_file, original_file):
""" Checks original_file to see if it has the contents of new_file """
new_lines = reversed(new_file.split("\n"))
original_lines = reversed(original_file.split("\n"))
appended = None
for new_line, orig_line in zip(new_lines, original_lines):
if new_line != orig_line:
appended = False
break
else:
appended = True
return appended

Maybe this will get you started - this GNU awk script:
gawk -v RS='^$' 'NR==FNR{f1=$0;next} {print (index($0,f1) ? "present" : "absent")}' file1 file2
will tell you if the contents of "file1" are present in "file2". It cannot tell you why, e.g. because you previously concatenated file1 onto the end of file2.
Is that all you need? If not update your question to clarify/explain.

Here's a technique to see if a file contains another file
contains_file_in_file() {
local small=$1
local big=$2
awk -v RS="" '{small=$0; getline; exit !index($0, small)}' "$small" "$big"
}
if ! contains_file_in_file /some/path/to/a/file /some/other/file; then
sudo cat /some/path/to/a/file >> /some/other/file
fi

EDIT: Op just told me in the comments that the files he wants to concatenate are bash scripts -- this brings us back to the good ole C preprocessor include guard tactics:
prepend every file with
if [ -z "$__<filename>__" ]; then __<filename>__=1; else
(of course replacing <filename> with the name of the file) and at the end
fi
this way, you surround the script in each file with a test for something that's only true once.

Does this work for you?
sudo (set -o noclobber; date > /tmp/testfile)
noclobber prevents overwriting an existing file.
I think it doesn't, since you wrote you want to append something but this technique might help.
When the appending all occurs in one script, then use a flag:
if [ -z "${appended_the_file}" ]; then
cat /some/path/to/a/file >> /some/other/file
appended_the_file="Yes I have done it except for permission/right issues"
fi
I would continue into writing a function appendOnce { .. }, with the content above. If you really want an ugly oneliner (ugly: pain for the eye and colleague):
test -z "${ugly}" && cat /some/path/to/a/file >> /some/other/file && ugly="dirt"
Combining this with sudo:
test -z "${ugly}" && sudo "cat /some/path/to/a/file >> /some/other/file" && ugly="dirt"

It appears that what you want is a collection of script segments which can be run as a unit. Your approach -- making them into a single file -- is hard to maintain and subject to a variety of race conditions, making its implementation tricky.
A far simpler approach, similar to that used by most modern Linux distributions, is to create a directory of scripts, say ~/.bashrc.d and keep each chunk as an individual file in that directory.
The driver (which replaces the concatenation of all those files) just runs the scripts in the directory one at a time:
if [[ -d ~/.bashrc.d ]]; then
for f in ~/.bashrc.d/*; do
if [[ -f "$f" ]]; then
source "$f"
fi
done
fi
To add a file from a skeleton directory, just make a new symlink.
add_fragment() {
if [[ -f "$FRAGMENT_SKELETON/$1" ]]; then
# The following will silently fail if the symlink already
# exists. If you wanted to report that, you could add || echo...
ln -s "$FRAGMENT_SKELETON/$1" "~/.bashrc.d/$1" 2>>/dev/null
else
echo "Not a valid fragment name: '$1'"
exit 1
fi
}
Of course, it is possible to effectively index the files by contents rather than by name. But in most cases, indexing by name will work better, because it is robust against editing the script fragment. If you used content checks (md5sum, for example), you would run the risk of having an old and a new version of the same fragment, both active, and without an obvious way to remove the old one.
But it should be straight-forward to adapt the above structure to whatever requirements and constraints you might have.
For example, if symlinks are not possible (because the skeleton and the instance do not share a filesystem, for example), then you can copy the files instead. You might want to avoid the copy if the file is already present and has the same content, but that's just for efficiency and it might not be very important if the script fragments are small. Alternatively, you could use rsync to keep the skeleton and the instance(s) in sync with each other; that would be a very reliable and low-maintenance solution.

Rollover shell script

Assuming a shell script(commands.sh) with few commands.
I need to write a script which sends the output of commands executed by commands.sh to a file f1.csv
if file size exceeds 1MB then the output flowing should go to file f2.csv
if the file size exceeds 1 mb again here,the output flowing should go to file f3.csv
if f3.csv exceeds the size 1mb,then the older f1 should be deleted and again new file f1 should be created,
output flowing should be to written to f1. This process should go on .
I can write the crontab file, just the shell script is a bit tricky
I have been experimenting:
#!/usr/bin/env bash
PREFIX="f"
# Maximum size after which you want a new file in bytes
MAX_SIZE=1048576
LAST_FILE=`ls "$prefix"*.csv | tail -1`
# Check if file exists and if it does not, create it.
if [[ -z "$LAST_FILE" ]]
then
LAST_FILE=$PREFIX"1.csv"
touch $LAST_FILE
fi
LAST_FILE_NO=`echo $LAST_FILE | sed s/$PREFIX/''/ | sed s/.csv/''/`
LAST_FILE_SIZE=`stat -c %s $LAST_FILE`
if [ `stat -c %s $LAST_FILE` -lt 200 ]
then
`/bin/sh ./sam.sh >> $LAST_FILE`
else
UPCOMING_FILE_NO=$((LAST_FILE_NO+1))
`/bin/sh ./sam.sh >> $PREFIX$UPCOMING_FILE_NO.csv`
fi
help is appreciated guys.
EDIT: Have got the secondary shell script to work too...
Now if anyone could help me with resetting after 3 files are done and starting from f1.
thanks

It sounds like you'd be better off using logrotate, depending on how your script is running. If you are running 'commands.sh' on a cron, you can have logrotate rotate out the logs. There is a good guide on logrotate here:
http://linuxers.org/howto/howto-use-logrotate-manage-log-files
If your commands.sh isn't going to be on a cron, meaning it's not a regular time interval that triggers it, you could manually set up a log rotation at the beginning of your script. I once had to do something similar. I found this guide really useful:
http://wazem.blogspot.com/2013/11/simple-bash-log-rotate-function.html

how to print the ouput/error to a text file?

I'm trying to redirect(?) my standard error/output to a text file.
I did my research, but for some reason the online answers are not working for me.
What am I doing wrong?
cd /home/user1/lists/
for dir in $(ls)
do
(
echo | $dir > /root/user1/$dir" "log.txt
) > /root/Desktop/Logs/Update.log
done
I also tried
2> /root/Desktop/Logs/Update.log
1> /root/Desktop/Logs/Update.log
&> /root/Desktop/Logs/Update.log
None of these work for me :(
Help please!

Try this for the basics:
echo hello >> log.txt 2>&1
Could be read as: echo the word hello, redirecting and appending STDOUT to the file log.txt. STDERR (file descriptor 2) is redirected to wherever STDOUT is being pointed. Note that STDOUT is the default and thus there is no "1" in front of the ">>". Works on the current line only.
To redirect and append all output and error of all commands in a script, put this line near the top. It will be in effect for the length of the script instead of doing it on each line:
exec >>log.txt 2>&1

If you are trying to obtain a list of the files in /home/user1/lists, you do not need a loop at all:
ls /home/usr1/lists/ >Update.log
If you are attempting to run every file in the directory as an executable with a newline as its input, and collect the output from all these programs in Update.log, try this:
for file in /home/user1/lists/*; do
echo | "$file"
done >Update.log
(Notice how we avoid the useless use of ls and how there is no redirection inside the loop.)
If you want to create an empty file called *.log.txt for each file in the directory, you would do
for file in /home/user1/lists/*; do
touch "$(basename "$file")"log.txt
done
(Using basename to obtain the file name without the directory part avoids the cd but you could do it the other way around. Generally, we tend to avoid changing the directory in scripts, so that the tool can be run from anywhere and generate output in the current directory.)
If you want to create a file containing a single newline, regardless of whether it already exists or not,
for file in /home/user1/lists/*; do
echo >"$(basename "$file")"log.txt
done
In your original program, you redirect the echo inside the loop, which means that the redirection after done will not receive any output at all, so the created file will be empty.
These are somewhat wild guesses at what you might actually be trying to accomplish, but should hopefully help nudge you slightly in the right direction. (This should properly be a comment, I suppose, but it's way too long and complex.)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How not to process files which were already processed before? (they will not be zipped or renamed) - linux

Related

Execute p4 aliased command result in messed wrong order of lines in output

balancing the bash calculations

One liner to append a file into another file but only if it hasn't already been added

Rollover shell script

how to print the ouput/error to a text file?

Categories

Resources