Bash get string between 2 6-digit numbers - string

I have a UTF-8-BOM encoded text file full of lines of which most start with a 6-10-digit (number increases every line) and have a string behind them.
I want to get each of those "lines" (including the number) to process further in my bash script.
It'd be an easy to do by just using a for loop with sed -n '$line\p' but unfortunately some of those strings I need have line breaks as part of them, so I need a way of extracting the string between two 6+ digit numbers (including the first number) which mark a new line.
An example of 3 "lines":
123456\tA random string here
123567\t another string
this time
it goes over
multiple lines
124567\t a normal string again
What I need:
123456\tA random string here
,
123567\t another string
this time
it goes over
multiple lines
and
124567\t a normal string again
A few things:
The strings are not surrounded with "" unfortunately
All numbers the strings contain are <6 digits long, so a >=6 digit number is always the start of a new string line
The number increases, so the number before the string is always lower than the one behind
I'd like to convert all special characters like tabs or line breaks to \t or \n
I need to get the byte length later in the script, a string must keep it's length
I'm still new here, so if I put this in the wrong place or if it was already answered, tell me!

I hope the "UTF-8-BOM encoded" is not a trap.
Here is my proposal if it is not.
bash-3.1$ sed -En '/^[0-9]{6,10}/!{:a;H;n;/^[0-9]{6,10}/!ba;x;s/\n/\\n/g;s/\t/\\t/g;p};/^[0-9]{6,10}/{x;s/\t/\\t/g;1!p;x;h;z;}' input.txt
Output for sample input (with a newline at the end):
123456\tA random string here
123567\t another string\nthis time\nit goes over\nmultiple lines
124567\t a normal string again
I assumed that the relevant 6-10 digits also always are at the start of a line,
otherwise it gets trickier.
Note:
The string length will increase by 1 for each newline \n or tabulator \t;
because the requested "\n" and "\t" are two characters each.

Related

How to pad text with spaces in Labview

I would like to left-pad spaces in a string as needed so it is always 8 characters long. I would also like to limit the initial string to 8 characters. Example:
Given string of "1234", should become "\s\s\s\s1234"
Given string of "123456789", should become "12345678"
I've tried "Scan From String" function using a format specifier of %8.8s, which I thought should limit the original length to 8 or less characters, and then pad spaces as necessary, to ensure a maximum of 8 characters in total.
I was expecting "1234" text to be turned into " 1234" but it just returned "1234".
It's labview "G" code so I can't enter text code.
You did not include a picture, so I am not sure if you are using the right VI, but you should be using the Format Into String VI. The parameter that you are looking for is called Width (see more about specifier syntax here).
Here is how you would use that to do what you are asking:
To see the spaces, make sure the string constant or indicator is set to view slash codes

how can I calculate how many characters trimstart removes

I have a string, and I need to calculate the number of spaces that I remove when I do trimStart.
For example, I have the following string \t\t \tabcs
so I have two tabs and two spaces and another tab that will be removed using trim start (the rest is non space related chars).
I need to know how many spaces will be removed. since I don't know how much is \t, I can't just count it as a single char.
(My purpose is to calculate the column shift of a string due to the trimming action. Obviously comparing the lengths before and after the trim will not return me the desired result.
Do you have any ideas?
Thanks!

VIM line count in status bar with thousands separator?

Is it possible to display the line count in the VIM status bar with thousands separators, preferably custom thousands separators?
Example:
set statusline=%L
should lead to "1,234,567" instead of "1234567".
I've found a way but it looks a bit crazy:
set statusline=%{substitute(line('$')\,'\\d\\zs\\ze\\%(\\d\\d\\d\\)\\+$'\,'\,'\,'g')}
The first round of backslashes is just for set (I have to escape , and \ itself).
What I'm actually setting the option to is this string:
%{substitute(line('$'),'\d\zs\ze\%(\d\d\d\)\+$',',','g')}
As a format string, this line contains one formatting code, which is %{...}. Everything in ... is evaluated as an expression and the result substituted back in.
The expression I'm evaluating is (spaces added (if I had added them to the real code, I would've had to escape them for set again, forcing yet more backslashes)):
substitute(line('$'), '\d\zs\ze\%(\d\d\d\)\+$', ',', 'g')
This is a call to the substitute function. The arguments are the source string, the regex, the replacement string, and a list of flags.
The string we're starting with is line('$'). This call returns the number of lines in the current buffer (or rather the number of the last line in the buffer). This is what %L normally shows.
The search pattern we're looking for is \d(\d\d\d)+$ (special vim craziness removed), i.e. a digit followed by 1 or more groups of 3 digits, followed by the end of the string. Grouping is spelled \%( \) in vim, and "1 or more" is \+, which gives us \d\%(\d\d\d\)\+$. The last bit of magic is \zs\ze. \zs sets the start of the matched string; \ze sets the end. This works as if everything before \zs were a look-behind pattern and everything after \ze were a look-ahead pattern.
What this amounts to is: We're looking for every position in the source string that is preceded by a digit and followed by exactly N digits (where N is a multiple of 3). This works like starting at the right and going left, skipping 3 digits each time. These are the positions where we need to insert a comma.
That's what the replacement string is: ',' (a comma). Because we're matching a string of length 0, we're effectively inserting into the source string (by replacing '' with ',').
Finally, the g flag says to do this with all matches, not just the first one.
TL;DR:
line('$') gives us the number of lines
substitute(..., '\d\zs\ze\%(\d\d\d\)\+$', ',', 'g') adds commas where we want them
%{ } lets us embed arbitrary expressions into statusline

Erase characters from a string until a specific character

Python 3.4
I've got an Excel file with some messy organizing, but one this is for sure:
I need EVERYTHING except the stuff that appears before the very first comma in every single line, the comma included.
Example:
Print command of the file gives me this:
Word1 Funky,Left Side,UDLRDURLUDRUDLUR
Nothing (because not) exists lol extraline,Right
Side,RBRGBRGBRGRBGRBGBR
What I want to get is this:
Left Side,UDLRDURLUDRUDLUR
Right Side,RBRGBRGBRGRBGRBGBR
I'd also like to make that into a dictionary:
dictionary = {"Left Side":"UDLRDURLUDRUDLUR", "Right Side":"RBRGBRGBRGRBGRBGBR",}
So basically I want to get rid of everything until the first comma (comma included), make the second part the key (ends at second comma), and third part the value (line ends with value).
What would be the easiest way to execute this?
Suppose s contains the string to be examined:
s = "word1,Left Side,UDLRDURLUDRUDLUR"
There are a number of ways to get rid of everything up to and including the first comma. You can use
Slicing coupled with find: s[s.find(',')+1:]
This expression will yield the desired result if the string s contain at least one comma, but it will yield the entire string if the string does not contain any commas.
Split coupled with indexing: s.split(',',1)[1]
This expression will yield the desired result if the string s contain at least one comma, but it will raise IndexError if the string does not contain any commas.
Regular expressions, but that's overkill here.
Other techniques, but those are also overkill here.

How can I detect "excessive spaces" in a string?

I'm making a simple android game in Lua, and in one of its steps to set the game is set an word (or sentence; basically, a string) input by the player. The "word" may have spaces, but I want to forbid the player to input a string with two or more spaces in a row, like "fly bird".
I tried using string.match(word, " "), string.match(word, "%s%s")
and string.match(word, "%s+%s+") and none of these worked, and somehow, the last one always "detect" double space, no matter if it has or not spaces.
What can I do to detect if there are multiple spaces in a row in a string? (Just detect, not replace, so I can send a warning message to the player.)
If its exactly two spaces you are interested in, simply use find
word:find(' ')
It will return range of first occurrence of two consecutive spaces.
input = input:gsub("%s+", " ")
The above code should take the input and remove all excessive spacing and replace it with just 1 space.

Resources