Change part of links in .html files - string

I am currently in the process of migrating a mediawiki to sharepoint.
I've created a dump of the wiki pages and am now in the process of modifing the files for a seamless import into a sharepoint wiki.
The last problem remaining is that the address of the pages have changed, so the links in the .html files won't work anymore.
The links are currently in the following format:
../../../a/b/c/sitename.html
The format i want to get to is:
http://host/sites/site/wiki/sitename.aspx
I can replace the first part (../../../a/b/c/) with sed.
The problem I'm facing lies in the second part (sitename.html). I want to keep sitename but replace the .html extension with .aspx. The method used should be applicable to different sitenames so that I don't have to add an extra sed command for every sitename.

One way with awk:
awk -F/ '/\.html/{sub(/\..*/,"",$NF); print "http://host/sites/site/wiki/"$NF".aspx"}' htmlfile

Try this GNU sed,
echo "../../../a/b/c/whateversitename.html" | sed 's#\(../../../a/b/c/\)\(.*\)\.html#http://host/sites/site/wiki/\2.aspx#g'

sed can work with multiple expressions, so if you are using one sed, you just add another expression to that, not another sed
[[bash_prompt$]]$ cat log
../../../a/b/c/sitename.html
[[bash_prompt$]]$ sed -e 's#../../../a/b/c#http://host/sites/site/wiki#g' \
> -e 's#html$#aspx#g' log
http://host/sites/site/wiki/sitename.aspx
Adding in a single expression
[[bash_prompt$]]$sed -e 's#../.*/\([^.]*\).html#http://host/sites/site/wiki/\1.aspx#g' log
http://host/sites/site/wiki/sitename.aspx

Related

Replacing sed with sed on RHEL6.7

I am trying to replace a sed command with a sed command and it keeps falling over so after a few hours of "picket fencing" I thought I would ask the question here.
I have various bash scripts that contain this kind of line:
sed 's/a hard coded server name servername.//'
I would like to replace it with:
sed "s/a hard coded server name $(hostname).//"
Note the addition of double quotes so that the $(hostname) is expanded which make this a little trickier than I expected.
So this was my first of many failed attempts:
cat file | sed 's!sed \'s\/a hard coded server name servername.\/\/\'!sed \"s\/a hard coded server name $(hostname).\/\/\"!g'
I also tried using sed's nice "-e" option to break down the replace into parts to try and target the problem areas. I wouldn't use the "-e" switch in a solution but it is useful sometimes for debugging:
cat file | sed -e 's!servername!\$\(hostname\)!' -e 's!\| sed \'s!\| sed \"s!'
The first sed works as expected (nothing fancy happening here) and the second fails so no point adding the third that would have to replace the closing double quote.
At this point my history descends into chaos so no point adding any more failed attempts.
I wanted to use the first replacement in a single command as the script is full of sed commands and I wanted to target just one specific command in the script.
Any ideas would be appreciated.
Here's how you could do it in awk if you ignore (or handle) metachars in the old and new text like you would with sed:
$ awk -v old="sed 's/a hard coded server name servername.//'" \
-v new="sed 's/a hard coded server name '\"\$(hostname)\"'.//' \
'{sub(old,new)}1' file
sed 's/a hard coded server name '"$(hostname)"'.//'
or to avoid having to deal with metachars, use only strings for the comparison and replacement:
$ awk -v old="sed 's/a hard coded server name servername.//'" \
-v new="sed 's/a hard coded server name '\"\$(hostname)\"'.//'" \
's=index($0,old){$0=substr($0,1,s-1) new substr($0,s+length(old))}1' file
sed 's/a hard coded server name '"$(hostname)"'.//'
Follow the behavior of templating tools by using a sequence that should never appear in actual use and replace that. For example, using colons simply because they require less quoting:
#!/bin/bash
sed "s/:servername:/$(hostname)/g" <<EOF > my_new_script.bash
echo "This is :servername:"
EOF
I've used echo in the internal script for purposes of clarity. You could have equally used something like:
sed 's/complex substitution :servername:/inside quotes :servername:/'
which avoids quoting hassles because the outer sed is treating the here document as plain text.

using sed with variable containing url

80I am trying to read a file and replace a placeholder with the content of another file. The problem is the variable contains urls which seems to cause problems in sed. In addition: what to do to keep the new lines from images.txt? Is there a way to make my solution work or is there maybe another solution that is better suited for my problem? I want to overwrite content of a file with the content of a backup file. In addition the step should include replacing a placeholder with the content of a third file. Thank you.
What I currently use:
<images.html
TEXT=$(<images.txt)
sed 's~URLS~$TEXT~g' imagesbu.html > images.html
This does not work and just shows:
sed: -e expression #1, char 80: unknown option to `s'
Content of the file is:
https://cdn.tutsplus.com/vector/uploads/legacy/tuts/165_Shiny_Dice/27.jpg
https://cdn.tutsplus.com/vector/uploads/legacy/tuts/165_Shiny_Dice/27.jpg
IF there is no newline in the file it works.
Try altering your sed delimiter so that it is not a forward slash:
sed "s~URLS~$TEXT~g" imagesbu.html > images.html
Edit: Your original sed command doesn't work because of the above, and because you are trying to replace a single word with multiple lines. Try awk instead:
awk -v u="$TEXT" '{gsub(/URLS/,u)}1' imagesbu.html > images.html

Extract Directory from Log File with sed

I'm trying to parse through an application.log that has many lines that follow the same syntax below.
"Error","jrpp-237","10/13/11","02:55:04",,"File not found: /indexUsa~.cfm The specific sequence of files included or processed is: c:\websites\pj7fe4\indexUsa~.cfm '' "
I need to use some type of command to pull out what is listed between c:\websites\ and the next \
e.g. in this case it would be pj7fe4
I thought that the following command would work..
bin/sed -n '/c:\\websites\\/,/\\/p' upload/test.log
Unfortunately from reading further I now understand that this will return the entire line containing c:\websites through the \ and I need to know the in between, not the whole line.
To be more difficult I need to match all of the directory sub paths, not just one particular line as this is for multiple sites.
You're using range patterns incorrectly. You can't use it to limit the command (print in this case) to a part of the line, only to a range of lines. You also don't escape the backspaces.
Try this: sed 's/.*c:\\websites\\\([0-9a-zA-Z]*\)\\.*/\1/'
There's a good sed tutorial here: Sed - An Introduction and Tutorial by Bruce Barnett
grep way:
grep -Po "(?<=c:\\\websites\\\)[^\\\]+(?=\\\)" yourFile
test:
kent$ echo '"Error","jrpp-237","10/13/11","02:55:04",,"File not found: /indexUsa~.cfm The specific sequence of files included or processed is: c:\websites\pj7fe4\indexUsa~.cfm '' "'|grep -Po "(?<=c:\\\websites\\\)[^\\\]+(?=\\\)"
pj7fe4

Linux command to replace string in LARGE file with another string

I have a huge SQL file that gets executed on the server. The dump is from my machine and in it there are a few settings relating to my machine. So basically, I want every occurance of "c://temp" to be replace by "//home//some//blah"
How can this be done from the command line?
sed is a good choice for large files.
sed -i.bak -e 's%C://temp%//home//some//blah%' large_file.sql
It is a good choice because doesn't read the whole file at once to change it. Quoting the manual:
A stream editor is used to perform
basic text transformations on an input
stream (a file or input from a
pipeline). While in some ways similar
to an editor which permits scripted
edits (such as ed), sed works by
making only one pass over the
input(s), and is consequently more
efficient. But it is sed's ability to
filter text in a pipeline which
particularly distinguishes it from
other types of editors.
The relevant manual section is here. A small explanation follows
-i.bak enables in place editing leaving a backup copy with .bak extension
s%foo%bar% uses s, the substitution command, which
substitutes matches of first string
in between the % sign, 'foo', for the second
string, 'bar'. It's usually written as s//
but because your strings have plenty
of slashes, it's more convenient to
change them for something else so you
avoid having to escape them.
Example
vinko#mithril:~$ sed -i.bak -e 's%C://temp%//home//some//blah%' a.txt
vinko#mithril:~$ more a.txt
//home//some//blah
D://temp
//home//some//blah
D://temp
vinko#mithril:~$ more a.txt.bak
C://temp
D://temp
C://temp
D://temp
Just for completeness. In place replacement using perl.
perl -i -p -e 's{c://temp}{//home//some//blah}g' mysql.dmp
No backslash escapes required either. ;)
Try sed? Something like:
sed 's/c:\/\/temp/\/\/home\/\/some\/\/blah/' mydump.sql > fixeddump.sql
Escaping all those slashes makes this look horrible though, here's a simpler example which changes foo to bar.
sed 's/foo/bar/' mydump.sql > fixeddump.sql
As others have noted, you can choose your own delimiter, which would prevent the leaning toothpick syndrome in this case:
sed 's|c://temp\\|home//some//blah|' mydump.sql > fixeddump.sql
The clever thing about sed is that it operating on a stream rather than a file all at once, so you can process huge files using only a modest amount of memory.
There's also a non-standard UNIX utility, rpl, which does the exact same thing that the sed examples do; however, I'm not sure whether rpl operates streamwise, so sed may be the better option here.
The sed command can do that.
Rather than escaping the slashes, you can choose a different delimiter (_ in this case):
sed -e 's_c://temp/_/home//some//blah/_' file1.txt > file2.txt
perl -pi -e 's#c://temp#//home//some//blah#g' yourfilename
The -p will treat this script as a loop, it will read the specified file line by line running the regex search and replace.
-i This flag should be used in conjunction with the -p flag. This commands Perl to edit the file in place.
-e Just means execute this perl code.
Good luck
gawk
awk '{gsub("c://temp","//home//some//blah")}1' file

Replacing a line in a csv file?

I have a set of 10 CSV files, which normally have a an entry of this kind
a,b,c,d
d,e,f,g
Now due to some error entries in this file have become of this kind
a,b,c,d
d,e,f,g
,,,
h,i,j,k
Now I want to remove the line with only commas in all the files. These files are on a Linux filesystem.
Any command that you recommend that can replaces the erroneous lines in all the files.
It depends on what you mean by replace. If you mean 'remove', then a trivial variant on #wnoise's solution is:
grep -v '^,,,$' old-file.csv > new-file.csv
Note that this deletes just those lines with exactly three commas. If you want to delete mal-formed lines with any number of commas (including zero) - and no other characters on the line, then:
grep -v '^,*$' ...
There are endless other variations on the regex that would deal with other scenarios. Dealing with full CSV data with commas inside quotes starts to need something other than a regex machine. It can be done, within broad limits, especially in more complex regex systems such as PCRE or Perl. But it requires more work.
Check out Mastering Regular Expressions.
sed 's/,,,/replacement/' < old-file.csv > new-file.csv
optionally followed by
mv new-file.csv old-file.csv
Replace or remove, your post is not clear... For replacement see wnoise's answer. For removing, you could use
awk '$0 !~ /,,,/ {print}' <old-file.csv > new-file.csv
What about trying to keep only lines which are matching the desired format instead of handling one exception ?
If the provided input is what you really want to match:
grep -E '[a-z],[a-z],[a-z],[a-z]' < oldfile.csv > newfile.csv
If the input is different, provide it, the regular expression should not be too hard to write.
Do you want to replace them with something, or delete them entirely? Either way, it can be done with sed. To delete:
sed -i -e '/^,\+$/ D' yourfile1.csv yourfile2.csv ...
To replace: well, see wnoise's answer, or if you don't want to create new files with the output,
sed -i -e '/^,\+$/ s//replacement/' yourfile1.csv yourfile2.csv ...
or
sed -i -e '/^,\+$/ c\
replacement' yourfile1.csv yourfile2.csv ...
(that should be entered exactly as is, including the line break). Of course, you can also do this with awk or perl or, if you're only deleting lines, even grep:
egrep -v '^,+$' < oldfile.csv > newfile.csv
I tested these to make sure they work, but I'd advise you to do the same before using them (just in case). You can omit the -i option from sed, in which case it'll print out the results (rather than writing them back to the file), or omit the output redirection >newfile.csv from grep.
EDIT: It was pointed out in a comment that some features of these sed commands only work on GNU sed. As far as I can tell, these are the -i option (which can be replaced with shell redirection, sed ... <infile >outfile ) and the \+ modifier (which can be replaced with \{1,\} ).
Most simply:
$ grep -v ,,,, oldfile > newfile
$ mv newfile oldfile
yes, awk or grep are very good option if you are working in linux platform. However you can use perl regex for other platform. using join & split options.

Resources