I need to find all *.xml files that matched by pattern on Linux. I need to have written the file name on the screen and then change the pattern in the file just was found.
For instance.
I can start the script with arguments for keyword and for value, i.e
script.sh keyword "another word"
Script should find all files with keyword and do the following changes in the files containing keyword.
<keyword></keyword> should be the same <keyword></keyword>
<keyword>some word</keyword> should be like this <keyword>some word, another word</keyword>
In other words if initially value in keyword node was empty, then I don't need to change it and if it contains some value then I need to extend it with the value I will specify.
What is best way to do this on Linux? Using find, grep, sed?
Performance is also important since the number of files are thousands.
Thank you.
It seems using a combination of find, grep and sed would do this and they are pretty fast since you'll be doing text processing so there might not be a need for xml processing but if you could you give an example or rephrase your question I might be able to provide more help.
Related
Question: using the less command in any linux shell (i'm using bash as probably most people do), is there a way to search a file only for it's commands or options?
So, to be more precise:
if i want to quickly find the description for one special option in a man-page,
is there a special search syntax to quickly jump to the corresponding line explaining that specific command?
example:
if i type:
man less
and i want to quickly find the description for the "-q" command,
is there a search syntax to directly jump to that line?
If I type /-q, it finds all occurences of "-q" everywhere in the file, so I get around 10-20 hits, of which only one is the one i was looking for..
So I'm just hoping there is a better/quicker way to do this..
(not to important though :D)
In man, options are generally described with the option name in bold at the start of the line.
So, if you are looking for the option -q, then the search command would be /^\s*-q\>
The regex ^\s*-q\> reads as follow:
^ start of a line
\s* any number of spaces (including none)
-q the option name you are looking for
\> the end of the word
I know how to match text using regex patterns but not how to manipulate them.
I have used grep to match and extract lines from a text file, but I want to remove those lines from the text. How can I achieve this without having to write a python or bash shell script?
I have searched on Google and was recommended to use sed, but I am new to it and don't know how it works.
Can anyone point me in the right direction or help me achieve this goal?
The -v option to grep inverts the search, reporting only the lines that don't match the pattern.
Since you know how to use grep to find the lines to be deleted, using grep -v and the same pattern will give you all the lines to be kept. You can write that to a temporary file and then copy or move the temporary file over the original.
grep -v pattern original.file > tmp.file
mv tmp.file original.file
You can also use sed, as shown in shellfish's answer.
There are multiple possible refinements for the grep solution, but for most people most of the time, what is shown is more or less adequate (it would be a good idea to use a per process intermediate file name, preferably with a random name such as the mktemp command gives you). You can add code to remove the intermediate file on an interrupt; suppress interrupts while moving back; use copy and remove instead of move if the original file has multiple hard links or is a symlink; etc. The sed command more or less works around these issues for you, but it is not cognizant of multiple hard links or symlinks.
Create the pattern which matches the lines using grep. Then create a sed script as follows:
sed -i '/pattern/d' file
Explanation:
The -i option means overwrite the input file, thus removing the files matching pattern.
pattern is the pattern you created for grep, e.g. ^a*b\+.
d this sed command stands for delete, it will delete lines matching the pattern.
file this is the input file, it can consist of a relative or absolute path.
For more information see man sed.
I know that this is quite an easy thing for any advanced Vim programmer, but I have been trying to find a solution for a couple of hours now.
In my results file, there are certain lines like:
/Users/name/Project/Task1/folder1 : INFO : Random Info message
Here, /Users/name/Project/Task1/folder1 is my pwd i.e present working directory.
I want to replace all the occurrences of my pwd above in the file with 'USER'. How can I do that?
:%s#/Users/name/Project/Task1/folder1#USER#g
or
:%s#<C-r>=getcwd()<CR>#USER#g
If I understand you correctly you can simply use the search and replace functionality and escape the / character like this:
:%s/\/Users\/name\/Project\/Task1\/folder1/USER/
If you need to replace multiple current working directories (and thus want to have the pwd to be dynamic) it is probably easier to use something like sed:
sed "s~$(pwd)~USER~" < file
Note that the ~ is used as a delimiter for the command instead of the /, this way we do not need to escape the / in the path.
In Linux how do I use find and regular expressions or a similar way without writing a script to search for files with multiple "dots" but IGNORE extension.
For e.g search through the following files will only return the second file. In this example ".ext" is the extension.
testing1234hellothisisafile.ext
testing.1234.hello.this.is.a.file.ext
The solution should work with one or more dots in the file name (ignoring the extension dot). This should also work for any files i.e. with any file extension
Thanks in advance
So if I understand correctly, you want to get the filenames with at least two additional dots in the name. This would do:
$ find -regex ".*\.+[^.]*\.+[^.]*\.+.*"
./testing.1234.hello.this.is.a.file.ext
./testing1234.hellothisisafile.ext
$ find -regex ".*\.+[^.]*\.+[^.]*\.+[^.]*\.+.*"
./testing.1234.hello.this.is.a.file.ext
The key dot detecting part is \.+ (at least one dot), coupled with the separating anything (but a dot, but the previous part covers it already; a safety measure against greedy matching) [^.]*. Together they make the core part of the regex - we don't care what is before or after, just that somewhere there are three dots. Three since also the one from the current dir matters — if you'll be searching from elsewhere, remove one \.+[^.]* group:
$ find delme/ -regex ".*\.+[^.]*\.+[^.]*\.+[^.]*\.+.*"
delme/testing.1234.hello.this.is.a.file.ext
$ find delme/ -regex ".*\.+[^.]*\.+[^.]*\.+.*"
delme/testing.1234.hello.this.is.a.file.ext
In this case the result is the same, since the name contains a lot of dots, but the second regex is the correct one.
I have an XML file like this:
<fruit><apple>100</apple><banana>200</banana></fruit>
<fruit><apple>150</apple><banana>250</banana></fruit>
Now I want delete all the text in the file except the words in tag apple. That is, the file should contain:
100
150
How can I achive this?
:%s/.*apple>\(.*\)<\/apple.*/\1/
That should do what you need. Worked for me.
Basically just grabbing everything up to and including the tag, then backreferences everything between the apple begin and end tag, and matches to the rest of the line. Replaces it with the first backreference, which was the stuff between the apple tags.
I personally use this:
%s;.*<apple>\(\d*\)</apple>.*;\1;
Since the text contain '/' which is the default seperator,and by using ';' as sep makes the code clearer.
And I found that non-greedy match #Conspicuous Compiler mentioned should be
\{-}
instead of "{-}" in Vim.
However, I after change Conspicuous' solution to
%s/.*apple>(.\{-\})<\/apple.*/\1^M/g
my Vim said it can't find the pattern.
In this case, one can use the general technique for collecting pattern matches
explained in my answer to the question "How to extract regex matches
using Vim".
In order to collect and store all of the matches in a list, run the Ex command
:let t=[] | %s/<apple>\(.\{-}\)<\/apple>\zs/\=add(t,submatch(1))[1:0]/g
The command purposely does not change the buffer's contents, only collects the
matched text. To set the contents of the current buffer to the
newline-separated list of matches, use the command
:0pu=t | +,$d_