I'm trying to parse through an application.log that has many lines that follow the same syntax below.
"Error","jrpp-237","10/13/11","02:55:04",,"File not found: /indexUsa~.cfm The specific sequence of files included or processed is: c:\websites\pj7fe4\indexUsa~.cfm '' "
I need to use some type of command to pull out what is listed between c:\websites\ and the next \
e.g. in this case it would be pj7fe4
I thought that the following command would work..
bin/sed -n '/c:\\websites\\/,/\\/p' upload/test.log
Unfortunately from reading further I now understand that this will return the entire line containing c:\websites through the \ and I need to know the in between, not the whole line.
To be more difficult I need to match all of the directory sub paths, not just one particular line as this is for multiple sites.
You're using range patterns incorrectly. You can't use it to limit the command (print in this case) to a part of the line, only to a range of lines. You also don't escape the backspaces.
Try this: sed 's/.*c:\\websites\\\([0-9a-zA-Z]*\)\\.*/\1/'
There's a good sed tutorial here: Sed - An Introduction and Tutorial by Bruce Barnett
grep way:
grep -Po "(?<=c:\\\websites\\\)[^\\\]+(?=\\\)" yourFile
test:
kent$ echo '"Error","jrpp-237","10/13/11","02:55:04",,"File not found: /indexUsa~.cfm The specific sequence of files included or processed is: c:\websites\pj7fe4\indexUsa~.cfm '' "'|grep -Po "(?<=c:\\\websites\\\)[^\\\]+(?=\\\)"
pj7fe4
Related
I am following the example in this post finding contents of one file into another file in unix shell script but want to print out differently.
Basically file "a.txt", with the following lines:
alpha
0891234
beta
Now, the file "b.txt", with the lines:
Alpha
0808080
0891234
gamma
I would like the output of the command is:
alpha
beta
The first one is "incorrect case" and second one is "missing from b.txt". The 0808080 doesn't matter and it can be there.
This is different from using grep -f "a.txt" "b.txt" and print out 0891234 only.
Is there an elegant way to do this?
Thanks.
Use grep with following options:
grep -Fvf b.txt a.txt
The key is to use -v:
-v, --invert-match
Invert the sense of matching, to select non-matching lines.
When reading patterns from a file I recommend to use the -F option as long as you not explicitly want that patterns are treated as regular expressions.
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings (instead of regular expressions), separated by newlines, any of which
is to be matched.
I have a file contained \n hidden behind each line:
input:
s3741206\n
s2561284\n
s4411364\n
s2516482\n
s2071534\n
s2074633\n
s7856856\n
s11957134\n
s682333\n
s9378200\n
s1862626\n
I want to remove \n behind
desired output:
s3741206
s2561284
s4411364
s2516482
s2071534
s2074633
s7856856
s11957134
s682333
s9378200
s1862626
however, I try this:
tr -d '\n' < file1 > file2
but it goes like below without space and new line
s3741206s2561284s4411364s2516482s2071534s2074633s7856856s11957134s682333s9378200s1862626
I also try sed $'s/\n//g' -i file1 and it doesn't work in mac os.
Thank you.
This is a possible solution using sed:
sed 's/\\n/ /g'
with awk
awk '{sub(/\\n/,"")} 1' < file1 > file2
What you are describing so far in your question+comments doesn't make sense. How can you have a multi-line file with a hidden newline character at the end of each line? What you show as your input file:
s3741206\n
s2561284\n
s4411364\n
etc.
where each "\n" above according to your comment is a single newline character "\n" is impossible. If those "\n"s were newline characters then your file would simply look like:
s3741206
s2561284
s4411364
etc.
There's really only 2 possibilities I can think of:
You are wrongly interpreting what you are seeing in your input file
and/or using the wrong terminology and you actually DO have \r\n
at the end of every line. Run cat -v file to see the \rs as
^Ms and run dos2unix or similar (e.g. sed 's/\r$//' file) to
remove the \rs - you do not want to remove the \ns or you will
no longer have a POSIX text file and so POSIX tools will exhibit
undefined behavior when run on it. If that doesn't work for you then
copy/paste the output of cat -v file into your question so we can
see for sure what is in your file.
Or:
It's also entirely possible that your file is a perfectly fine POSIX
text file as-is and you are incorrectly assuming you will have a
problem for some reason so also include in your question a
description of the actual problem you are having, include an example
of the command you are executing on that input file and the output
you are getting and the output you expected to get.
You could use bash-native string substitution
$ cat /tmp/newline
s3741206\n
s2561284\n
s4411364\n
s2516482\n
s2071534\n
s2074633\n
s7856856\n
s11957134\n
s682333\n
s9378200\n
s1862626\n
$ for LINE in $(cat /tmp/newline); do echo "${LINE%\\n}"; done
s3741206
s2561284
s4411364
s2516482
s2071534
s2074633
s7856856
s11957134
s682333
s9378200
s1862626
I am quite new to shell scripting.
I am scraping a website and the scraped text contains a lot of repetitions. Usually they are the menus on a forum, for example. Mostly, I do this in Python, but I thought that sed command will save me reading and printing the input, loops etc. I want to delete thousands of repeated lines from the same single file. I do not want to copy it to another file, because I will end up with 100 new files. The following is a shadow script which I run from the bash shell.
#!/bin/sed -f
sed -i '/^how$/d' input_file.txt
sed -i '/^is test$/d' input_file.txt
sed -i '/^repeated text/d' input_file.txt
This is the content of the input file:
how to do this task
why it is not working
this is test
Stackoverflow is a very helpful community of programmers
that is test
this is text
repeated text is common
this is repeated text of the above line
Then I run in the shell the following command:
sed -f scriptFile input_file.txt
I get the following error
sed: scriptFile line 2: untermindated `s' command
How can I correct the script, and what is the correct syntax of the command I should use to get it work?
Any help is highly appreciated.
assuming you know what your script is doing, it's very easy to put them into a script. in your case, the script should be:
/^how$/d
/^is test$/d
/^repeated text/d
that's good enough.
to make the script alone to be executable is easy too:
#!/usr/bin/env sed -f
/^how$/d
/^is test$/d
/^repeated text/d
then
chmod +x your_sed_script
./your_sed_script <old >new
here is a very good and compact tutorial. you can learn a lot from it.
following is an example from the site, just in case the link is dead:
If you have a large number of sed commands, you can put them into a file and use
sed -f sedscript <old >new
where sedscript could look like this:
# sed comment - This script changes lower case vowels to upper case
s/a/A/g
s/e/E/g
s/i/I/g
s/o/O/g
s/u/U/g
Wouldn't it be easier to do it with egrep followed by a mv, for example
egrep -v 'pattern1|pattern2|pattern3|...' <input_file.txt >tmpfile.txt
mv tmpfile.txt input_file.txt
Each pattern would describe the lines being deleted, much like in sed. You would not end up with additional files, because the mv removes them.
If you have so many pattern, that you don't want to specify them directly on the command line, you can store them in a file use the -f option of egrep.
For example suppose I have the following piece of data
ABC,3,4
,,ExtraInfo
,,MoreInfo
XYZ,6,7
,,XyzInfo
,,MoreXyz
ABC,1,2
,,ABCInfo
,,MoreABC
It's trivial to get grep to extract the ABC lines. However if I want to also grab the following lines to produce this output
ABC,3,4
,,ExtraInfo
,,MoreInfo
ABC,1,2
,,ABCInfo
,,MoreABC
Can this be done using grep and standard shell scripting?
Edit: Just to clarify there could be a variable number of lines in between. The logic would be to keep printing while the first column of the CSV is empty.
grep -A 2 {Your regex} will output the two lines following the found strings.
Update:
Since you specified that it could be any number of lines, this would not be possible as grep focuses on matching on a single line see the following questions:
How can I search for a multiline pattern in a file?
Regex (grep) for multi-line search needed
Why can't i match the pattern in this case?
Selecting text spanning multiple lines using grep and regular expressions
You can use this, although a bit hackity due to the grep at the end of the pipeline to mute out anything that does not start with 'A' or ',':
$ sed -n '/^ABC/,/^[^,]/p' yourfile.txt| grep -v '^[^A,]'
Edit: A less hackity way is to use awk:
$ awk '/^ABC/ { want = 1 } !/^ABC/ && !/^,/ { want = 0 } { if (want) print }' f.txt
You can understand what it does if you read out loud the pattern and the thing in the braces.
The manpage has explanations for the options, of which you want to look at -A under Context Line Control.
I have a set of 10 CSV files, which normally have a an entry of this kind
a,b,c,d
d,e,f,g
Now due to some error entries in this file have become of this kind
a,b,c,d
d,e,f,g
,,,
h,i,j,k
Now I want to remove the line with only commas in all the files. These files are on a Linux filesystem.
Any command that you recommend that can replaces the erroneous lines in all the files.
It depends on what you mean by replace. If you mean 'remove', then a trivial variant on #wnoise's solution is:
grep -v '^,,,$' old-file.csv > new-file.csv
Note that this deletes just those lines with exactly three commas. If you want to delete mal-formed lines with any number of commas (including zero) - and no other characters on the line, then:
grep -v '^,*$' ...
There are endless other variations on the regex that would deal with other scenarios. Dealing with full CSV data with commas inside quotes starts to need something other than a regex machine. It can be done, within broad limits, especially in more complex regex systems such as PCRE or Perl. But it requires more work.
Check out Mastering Regular Expressions.
sed 's/,,,/replacement/' < old-file.csv > new-file.csv
optionally followed by
mv new-file.csv old-file.csv
Replace or remove, your post is not clear... For replacement see wnoise's answer. For removing, you could use
awk '$0 !~ /,,,/ {print}' <old-file.csv > new-file.csv
What about trying to keep only lines which are matching the desired format instead of handling one exception ?
If the provided input is what you really want to match:
grep -E '[a-z],[a-z],[a-z],[a-z]' < oldfile.csv > newfile.csv
If the input is different, provide it, the regular expression should not be too hard to write.
Do you want to replace them with something, or delete them entirely? Either way, it can be done with sed. To delete:
sed -i -e '/^,\+$/ D' yourfile1.csv yourfile2.csv ...
To replace: well, see wnoise's answer, or if you don't want to create new files with the output,
sed -i -e '/^,\+$/ s//replacement/' yourfile1.csv yourfile2.csv ...
or
sed -i -e '/^,\+$/ c\
replacement' yourfile1.csv yourfile2.csv ...
(that should be entered exactly as is, including the line break). Of course, you can also do this with awk or perl or, if you're only deleting lines, even grep:
egrep -v '^,+$' < oldfile.csv > newfile.csv
I tested these to make sure they work, but I'd advise you to do the same before using them (just in case). You can omit the -i option from sed, in which case it'll print out the results (rather than writing them back to the file), or omit the output redirection >newfile.csv from grep.
EDIT: It was pointed out in a comment that some features of these sed commands only work on GNU sed. As far as I can tell, these are the -i option (which can be replaced with shell redirection, sed ... <infile >outfile ) and the \+ modifier (which can be replaced with \{1,\} ).
Most simply:
$ grep -v ,,,, oldfile > newfile
$ mv newfile oldfile
yes, awk or grep are very good option if you are working in linux platform. However you can use perl regex for other platform. using join & split options.