Using sed in red hat linux to replace text - linux

I have some XML files in a directory and they all include the tag: <difficult>0</difficult>. I just want to change that to <difficult>1</difficult>.
I'm using the following command:
sed 's/difficult>0/difficult>1/g' *.xml
All that happens is that the full XML text of all the files gets displayed, with the difficult tag showing a value of 1, but nothing happens to the actual files. When I open them, they still all contain <difficult>0</difficult>.

sed -i 's/difficult>0/difficult>1/g' *.xml
Change a string in a file with sed?

Yes, sed usually puts its result on stdout. To change the files in-place, use the -i flag:
-i[SUFFIX], --in-place[=SUFFIX]
edit files in place (makes backup if SUFFIX supplied)

One more time :)
Don't use regex to parse HTML. Using a proper parser & xpath :
xml ed -L -u '//difficult/text()' -v "1" file
xml is xmlstarlet
Check: RegEx match open tags except XHTML self-contained tags

Related

How to replace an unknown string in multiple files under Linux?

I want to change multiple different strings across all files in a folder to one new string.
When the string in the text files (within a same directory) is like this:
file1.json: "url/1.png"
file2.json: "url/2.png"
file3.json: "url/3.png"
etc.
I would need to point them all to a single PNG, i.e., "url/static.png", so all three files have the same URL inside pointing to the same PNG.
How can I do that?
you can use the command find and sed for this. make sure you are in the folder that you want to replace files.
find . -name '*.*' -print|xargs sed -i "s/\"url\/1.png\"/\"url\/static.png\"/g"
Suggesting bash script:
#!/bin/bash
# for each file with extension .json in current directory
for currFile in *.json; do
# extract files ordinal from from current filename
filesOrdinal=$(echo "#currFile"| grep -o "[[:digit:]]\+")
# use files ordinal to identify string and replace it in current file
sed -i -r 's|url/'"$filesOrdinal".png'|url/static.png|' $currFile
done

bulk rename pdf files with name from specific line of its content in linux

I have multiple pdf files which I want to rename. new name should be taken from pdf's file content on specific(lets say 5th) line. for example, if file's 5th line has content some string <-- this string should be name of file. and same thing goes to the rest of files. each file should be renamed with content's 5th line. I tried this in terminal
for pdf in *.pdf
do
filename=`basename -s .pdf "${pdf}"`
newname=`awk 'NR==5' "${filename}.pdf"`
mv "${pdf}" "${newname}"
done
it copies the files, but name is invalid string. I know the system doesn't see the file as plain text and images, there are metadata, xml tags and so on.. but is there way to take content from that line?
Out of the box, bash and its usual utilities are not able to read pdf files. However, less is able to recover the text from a pdf file. You could change your script as follow :
for pdf in *.pdf
do
mv "$pdf" "$(less $pdf | sed '5q;d').pdf"
done
Explanation :
less "$pdf" : display the text part of the pdf file. Will take spacing into account
make some tests to see if less returns the desired output
sed '5q;d' : extracts the 5th line of the input file
Optionally, you could use the following script to remove blank lines and exceeding spaces :
mv "$pdf" "$(less "$pdf" | sed -e '/^\s*$/d' -e 's/ \+/ /g' | sed '5q;d').pdf"

How to replace a string containing "\u2015"?

Does anyone know how to replace a string containing \u2015 in a SED command like the example below?
sed -ie "s/some text \u2015 some more text/new text/" inputFileName
You just need to escape the slashes present. Below example works fine in GNU sed version 4.2.1
$ echo "some text \u2015 some more text" | sed -e "s/some text \\\u2015 some more text/abc/"
$ abc
Also you don't have to use the -i flag which according to the the man page is only for editing files in-place.
-i[SUFFIX], --in-place[=SUFFIX]
edit files in place (makes backup if extension supplied). The default operation mode is to break symbolic and hard links. This can be changed with --follow-symlinks and
--copy.
Not sure if this is exactly what you need, but maybe you should take a look at native2ascii tool to convert such unicode escapes.
Normally it replaces all characters that cannot be displayed in ISO-8859-1 with their unicodes (escaped with \u), but it also supports reverse conversions. Assuming you have some file in UTF-8 named "input" containing \u00abSome \u2015 string\u00bb, then executing
native2ascii -encoding UTF-8 -reverse input output
will result in "output" file with «Some ― string».

linux extract string between tags and paste between others tags

I have files with xml text like:
<tag1>unknown string1</tag1>blablabla....<tag2></tag2>
I want use sed (or another command) to extract string between tag's 1 and paste between tag's 2, to result:
<tag1>unknown string1</tag1>blablabla....<tag2>unknown string1</tag2>
Thanks.
I found a solution!.
sed 's/\(.*<tag1>\)\(.*\)\(<\/tag1>.*<tag2>\)\(**\)\(<\/tag2>.*\)/\1\2\3\2\5/' file
Divide entire file in references, and after reorder this in convenience.
Try this sed command
Command :
sed 'N;s/\(<tag1>\(.*\)<\/tag1>\n<tag2>\).*\(<\/tag2>\)/\1\2\3/' FIleName
Output:
<tag1>unknown string1</tag1>
<tag2>unknown string1</tag2>
This might work for you (GNU sed):
sed -r '/<tag1>/h;/<tag2>/{G;s/>.*(<.*)\n.*>(.*)<.*/>\2\1/}' file
This makes a copy of tag1 in the hold space (HS) and on encountering tag2 appends the HS to the current line and uses pattern matching to produce the required string.
N.B. this assumes one tag per line.

Change part of links in .html files

I am currently in the process of migrating a mediawiki to sharepoint.
I've created a dump of the wiki pages and am now in the process of modifing the files for a seamless import into a sharepoint wiki.
The last problem remaining is that the address of the pages have changed, so the links in the .html files won't work anymore.
The links are currently in the following format:
../../../a/b/c/sitename.html
The format i want to get to is:
http://host/sites/site/wiki/sitename.aspx
I can replace the first part (../../../a/b/c/) with sed.
The problem I'm facing lies in the second part (sitename.html). I want to keep sitename but replace the .html extension with .aspx. The method used should be applicable to different sitenames so that I don't have to add an extra sed command for every sitename.
One way with awk:
awk -F/ '/\.html/{sub(/\..*/,"",$NF); print "http://host/sites/site/wiki/"$NF".aspx"}' htmlfile
Try this GNU sed,
echo "../../../a/b/c/whateversitename.html" | sed 's#\(../../../a/b/c/\)\(.*\)\.html#http://host/sites/site/wiki/\2.aspx#g'
sed can work with multiple expressions, so if you are using one sed, you just add another expression to that, not another sed
[[bash_prompt$]]$ cat log
../../../a/b/c/sitename.html
[[bash_prompt$]]$ sed -e 's#../../../a/b/c#http://host/sites/site/wiki#g' \
> -e 's#html$#aspx#g' log
http://host/sites/site/wiki/sitename.aspx
Adding in a single expression
[[bash_prompt$]]$sed -e 's#../.*/\([^.]*\).html#http://host/sites/site/wiki/\1.aspx#g' log
http://host/sites/site/wiki/sitename.aspx

Resources