Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
Hi I have a huge set of data with thousands of columns, one of the column I need to extract certain string patterns: e.g. 41242456-2020-12 or 41242456-2020-2 or 41242456-2020-200 (8 digit number-year-1~3 digit number), that was mixed among text in the string, e.g. most of times the numbers appear in the beginning, sometimes its like the following:
Blah Blah LEX#41242456-2020-12BLABLABLAH
Blah Blah LEXIDA ID:41242456-2020-12BLAHBLAHBLAH etc.
Hence unable to extract them fully through one formula.
Is there a way I can use any formula/vba code to only extract 41242456-2020-12 and removing all other characters?
Look here and elsewhere on the web on how to use regular expressions in Excel.
The regular expression you want to match against is \d{8}-[12]\d{3}-\d{1,3} which means
eight numbers
a dash
a "1" or a "2" (because if it's 3, or 0 then I assume it's not a valid year)
three numbers
a dash
one to three numbers
You might want to use (\d{8})-([12]\d{3})-(\d{1,3}) so that matching will give you the three numbers for you. Parentheses in regular expressions mean 'return what matched this part.'
Please help me with a unix command to replace anything between two delimiter positions.
For ex: I have multiple files with below header data and I want replace the data between * delimiters at 9th and 10th position
ISA*00* *00* *ZZ*80881 *ZZ*TNC0022 *190115*1237*^*00501*000320089*0*P*|~
My output should like this:
ISA*00* *00* *ZZ*80881 *ZZ*TNC0022 *190327*1237*^*00501*000320089*0*P*|~
Try this:
perl -pe 's/^((?:[^*]*\*){9})([^*]+)(.*)/${1}190327$3/'
The regexp searches for 9 occurences {9} of anything but not being a star [^*] followed by a star \* and stores all in the first capture group. The second capture is at least one character not being a star [^*]+. And the third capture is the rest of the line.
A matching line gets replaced by the first part ${1}, your new value 190327 and the third part $3.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Closed 9 years ago.
Improve this question
In a file (tab delimited Text, CSV or database file) you have first name, last name and address. In some rows you do not have last name but first name and address is there. How can you list the rows that last name is blank using UNIX command?
FirstName LastName Street City
Dan, God, 1st Street, Chicago
Sam, , 2nd Street, Chicago
Adam, Smith, 3rd Street, Chicago
It could be CSV, tab delimited text file(;,:). answer should be 2nd row above.
Assuming input file is CSV, you can use awk:
awk -F, '$2 == ""' file
to print all the rows where 2nd columns (last name) is blank.
Try this:
awk 'NF!=3' file
it prints all lines where the number of fields is not 3.
Since you didn't provide sample text, I've had to take some guesses about what you're after.
Here's the sample text I'm using:
06:33:20 0 1 james#brindle:/tmp$ cat sample.csv
first,last,address,otherstuff
first,,address,otherstuff
first,last,,
A simple grep ,, doesn't work as it also finds the last line:
06:33:22 0 0 james#brindle:/tmp$ grep ,, sample.csv
first,,address,otherstuff
first,last,,
Since the first name field is first on the line, we can simplify the problem a little bit: we want to find places where the first comma on the line is immediately followed by a second comma.
06:35:07 0 0 james#brindle:/tmp$ grep "^[^,]*,," sample.csv
first,,address,otherstuff
In that regex, the first ^ anchors the regex to the start of the line; [^,]* matches 0 or more occurrences of any character except the comma (yes, the ^ is doing something very different in this context), and finally ,, matches the two commas.
If you wanted to look for the 3rd field being empty you'd need to repeat yourself a little bit.
06:35:28 0 0 james#brindle:/tmp$ grep "^[^,]*,[^,]*,," sample.csv
first,last,,
Here you're looking for 0 or more non-comma characters, followed by a comma, followed by 0 or more non-commas, followed by two commas.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I want to expand the string like below, but without using the extra space..
a5b1c0d5a1a1
And result should be..
aaaaabaa
I am stuck here. How to do it without extra space?
I would read each char, check is is letter, than take the next char, check if its a number, than just add to the resulting string the letter times.
In your example, The first thing I would read is a, a is a letter, so read the next, check if its a number, it is. So append to a resulting string five a's.
Use a loop times to append letter, for example.
UPDATE
Explaning my comment better.
So you're looping through the string.
index 0 you have the 'a'. So you read a letter, then you expect to get a number, which is 5.
I divide now the string in to other string. The first one will have everything until a, which in this case is only a.
The second one will have everything after the number, in this case 5, which will be b1c0d5a1a1
So take the first string, concatenate with the 4 (5-1, you already have the first a) an then concatenate with the rest of the string.
string = b1c0d5a1a1
string = substring(0,1) + "aaaa" + substring(1,stringsize-1);
In the cases like 0, you can play around with the substring indexes so you can remove the letter, instead of adding some more.
My document looks something like this:
Line number one
Line number two
Line number three
I want the whole document to look like this:
Line number one
Line number two
Line number three
In other words, to remove all the empty lines. How to accomplish this?
Try :g/^$/d, which will remove all blank lines. The g indicates global, the ^$ is a regular expression that basically means 'match lines that start and end with nothing in between', and the d means delete. You can mix and match as much as you need :)
Another space-related command that may come in handy if you have random whitespace is :%s/\s\+$//, which trims any trailing whitespace (as #Bernhard points out, the $ operator means that you have a max of one occurrence per line, so the g is unnecessary).
Per the update, possible that the lines already contain whitespace, in which case :g/^\s*$/d should work.
The command I use is
:v/./d
The v command matches the lines that do not match the given pattern.
It was inherited from ed.