Batch text shuffling system - string

I have a text string which is user definable in length
As example the user has entered 1234567890
What I want is to pull out every first character followed by every 3rd character
So we get the following
1st | 1234567890 = 1
3rd | 234567890 = 14
1st | 23567890 = 142
3rd | 3567890 = 1426
1st | 357890 = 14263
3rd | 57890 = 142638
1st | 5790 = 1426385
3rd | 790 = 14263850
1st | 79 = 142638507
3rd | 9 = 1426385079
I also need to account for the e fact that in the end the last two numbers will have less then three digits.
Anyone ideas on how I could achieve this in batch?

This is where batch string manipulation gets really useful:
#echo off
set str="1234567890"
for %%a in (%str%) do set str=%%~a
set newstr=
:Loop
set "first=%str:~0,1%"
set "fourth=%str:~3,1%"
set "str=%str:~1,2%%str:~4%"
set "newstr=%newstr%%first%%fourth%"
if not "%fourth%"=="" goto Loop
set "newstr=%newstr%%str:~1%%str:~0,1%
echo.%newstr%
For the example input 1234567890 from your question, the output would be indeed:
1426385079
Explanation
This code works using a loop (which is somewhat of an equivalent to a while loop in C).
In every iteration of the loop the first and fourth characters are extracted from str and appended to newstr, which will eventually hold the final output.
Next, str is then updated by appending the following two substrings:
%str:~1,2% extracts two characters starting from the second character (indices start from 0, so the second index is 1).
%str:~4% extracts all characters starting from the fifth
The new value of str is basically the old value, without the first and fourth characters.
The loop stops when str holds three or less characters, (that is whenfourth is an empty string!). After the loop, the last (three or less) characters are dealt with and appended to newstr in the correct order -- this is that special case that you wanted to account for.
Hope that helps!

Related

Replace string parts that appear twice Oracle

I am trying to work out in Oracle how to isolate/highlight word combinations in a concatenated string like the one below:
Some words##Again words##More of this||####||Some words##Again words##Other
The idea is to find the word combinations that appear exactly twice and replace them by 0 so I'm left with the ones that appear only once, either on the left side of the ||####|| or on the right side. The result of the query should be something like this:
Highlighted
Some words##Again words##More of this||####||Some words##Again words##**Other**
Replaced
0##0##More of this||####||0##0##Other
To give you some more information about the concatenation: the left side (before the ||####||) is my current customer record, while on the right hand side I have the previous version. By making the replacements I can reveal any differences between customer records.
I have tried to get this done by using:
regexp_replace: this does not work entirely with REGEXP_REPLACE(MY STRING,'((Some words){1,2})|((Again words){1,2})','0',1,0) as for some reason the string parts in my first record are never correctly replaced. I'm also hitting the limits of this function due to the number of word combinations I need to match;
nested CASE WHEN: does not work either obviously as CASE WHEN - even nested - stops when the first match is found but I need to have all conditions checked and replaced.
I have thought about using subselects, but as this query uses one of the largest tables in my schema, this will not be usable except on a per customer basis. And it might still not work...
Some more information in order to find a solid, performant solution:
I have 34 possible word combinations to match
I have no idea which ones will be there, ever, except when I run the query obviously
I have no idea in which order they will be in the concatenated string
I hope this is clear. Anyone with some magical ideas?
Thanks in advance
You can use a recursive sub-query factoring clause to replace one duplicated term at each iteration:
WITH replaced ( value, start_char ) AS (
SELECT REGEXP_REPLACE(
value,
'(##|^)([^#]+?)((##[^#]+?)*\|\|####\|\|([^#]+?##)*)\2(##|$)',
'\10\30\6',
1
),
REGEXP_INSTR(
value,
'(##|^)([^#]+?)((##[^#]+?)*\|\|####\|\|([^#]+?##)*)\2(##|$)',
1
)
FROM table_name
UNION ALL
SELECT REGEXP_REPLACE(
value,
'(##|^)([^#]+?)((##[^#]+?)*\|\|####\|\|([^#]+?##)*)\2(##|$)',
'\10\30\6',
start_char + 1
),
REGEXP_INSTR(
value,
'(##|^)([^#]+?)((##[^#]+?)*\|\|####\|\|([^#]+?##)*)\2(##|$)',
start_char + 1
)
FROM replaced
WHERE start_char > 0
)
SELECT value
FROM replaced
WHERE start_char = 0;
Which, for the sample data:
CREATE TABLE table_name ( value ) AS
SELECT 'Some words##Again words##More of this||####||Some words##Again words##Other' FROM DUAL UNION ALL
SELECT '333##123##789##555||####||123##456##789##222##333' FROM DUAL;
Outputs:
| VALUE |
| :------------------------------------ |
| 0##0##More of this||####||0##0##Other |
| 0##0##0##555||####||0##456##0##222##0 |
db<>fiddle here
Explanation:
The regular expression matches:
(##|^) either two # characters or the start of the string ^ (in the first capturing group ());
([^#]+?) one-or-more characters that are not # (in the second capturning group ());
( the start of the 3rd capturing group;
(##[^#]+?)* two # characters followed by one-or-more non-# characters (in the 4th capturing group ()) all repeated zero-or-more * times;
\|\|####\|\| then two | characters, four # characters and two | characters;
([^#]+?##)* then one-of-more non-# characters followed by two # characters (in the 5th capturing group ());
) the end of the 3rd capturing group;
\2 a duplicate of the 2nd capturing group; then
(##|$) either two # characters or the end-of-the-string $ (in the 6th capturing group).
This is replaced by:
\10\30\6 which is the contents of the 1st capturing group then a zero (replacing the 2nd capturing group) then the contents of the 3rd capturing group then a second zero (replacing the matched duplicate) then the contents of the 6th capturing group.
The query will replace a pair of duplicate terms in the string (if they exist) and REGEXP_INSTR will find the start of the match and put the values into value and start_char (respectively); then at the next iteration the regular expression will start looking from the next character on from the start of the previous match, so that it will gradually move across the string finding matches until no more duplicate terms can be found and REGEXP_REPLACE will not perform a replacement and REGEXP_INSTR will return 0 and the iteration will terminate.
The final query filters to return the only the final level of the iteration (when all the duplicates have been replaced).

how to vlookup if prefix found in the list?

HI.
how can i come up with return value of "company name" (column H) at Column B IF any of the "PrefiX" (Column G) found at "con no" (Column A).
Sample of outcome needed as in column B.
Sample:
620011113 = DD
CN1234 = BB
thanks
=INDEX($H:$H,AGGREGATE(15,6,ROW($G$1:$G$7)/(--(FIND($G$1:$G$7,$A2)=1)*--(LEN($G$1:$G$7)>0)),1),1)
Breaking this down, the INDEX retrieves the Nth item from Column H (Company name). To find the value of N, we are using the AGGREGATE function
AGGREGATE is a weird function - it lets us use things like MAX or LARGE or SUM while ignoring any error values. In this case, we will be using it for SMALL (first argument, 15), while Ignoring Error Values (second argument, 6). We will want the very smallest value, so the fourth argument will be 1. (If we wanted the second smallest, it would be 2, and so on)
=INDEX($H:$H,AGGREGATE(15,6, <SOMETHING> ,1),1)
So, all we need now is a list of values to compare! To make things slightly simpler, I'll break that bit of the code out for you here:
ROW($G$1:$G$7) / (--(FIND($G$1:$G$7,$A2)=1) * --(LEN($G$1:$G$7)>0))
There are 3 parts to this. The first, ROW($G$1:$G$7)is the actual value we want to retrieve - these will be the Row Numbers for each Prefix that matches your value. On its own, however, it will be all the row numbers. Since we are skipping errors, we want any Rows that don't match the prefix to throw an error. The easiest way to do this is to Divide by Zero
At the start of --(FIND($G$1:$G$7,$A2)=1) and --(LEN($G$1:$G$7)>0) we have a double-negative. This is a quick way to convert True and False to 1 and 0. Only when both tests are True will we not divide by 0, as this table shows:
A | B | A*B
1 | 1 | 1
1 | 0 | 0
0 | 1 | 0
0 | 0 | 0
Starting with the second test first (it's easier), we have LEN($G$1:$G$7)>0 - basically "don't look at blank cells".
The other test (FIND($G$1:$G$7,$A2)=1) will search for the Prefix in the Con No, and return where it is found (or a #VALUE! error if it isn't). We then check "is this at position 1" - in other words, "Is this at the start of the Con No, rather than in the middle". We don't want to say Con No CNQ6060 is part of Company AA instead of Company BB by mistake!
So, if the Prefix is at the Start of the Con No, AND it isn't Blank (because there is an infinite amount of Nothing Before, After, and Between every number and letter), then we get it added to our list of Rows. We then take the smallest row (i.e. closest to the top - change AGGREGATE(15 to AGGREGATE(14 if you want the closest to the bottom!), and use that to get the Company Name
You could try the below formal:
=VLOOKUP(IF(LEFT(A3,1)="6",LEFT(A3,4),IF(LEFT(A3,1)="C",LEFT(A3,2),IF(LEFT(A3,1)="E",LEFT(A3,7)))),$G$3:$H$7,2,0)
Have in mind that you have to use ' before the cell value of column A & G in order to convert cell value into text get the correct out comes using VLOOKUP
Result:

How to start a string at a certain character and end it at a certain character? (Lua)

Here's my question. I'm using Lua and I have a string that looks something like this:
"Start1.2.3.4.5-1.2.3.4.5-1.2.3.4.5-1.2.3.4.5-1.2.3.4.5End"
The five numbers between each hyphen are all paired to the same "object" but each represents a separate set of data. The period between the numbers separates the data.
So after Start, 1 = our first value, 2 = our second value, 3 = our third value, 4 = our fourth value, and 5 = our fifth value. These 5 values are stored to the same object. Then we hit our first hyphen which separates the "objects". So there's 5 objects and 5 values per object.
I used 1.2.3.4.5 as an example but these numbers will be randomized with up to 4 digits. So it could say something like Start12.3.100.1025.50- etc...
Hopefully that makes sense. Here's what I have done so far:
MyString = the long string I posted above
local extracted = string.match(MyString, "Start(.*)")
This returns everything beyond Start in the string. However, I want it to return everything after Start and then cut off once it reaches the next hyphen. Then from that point on I'll repeat the process but instead find everything between the hyphens until I reach End. I also need to filter out the periods. Also, the hyphens/periods can change to something else as long as they aren't numbers.
Any ideas on how to do this?
Just use a pattern that captures anything that contains numbers and periods.
"([%d%.]+)" Note that you have to escape the period with % as it is a magic character.
local text = "Start1.2.3.4.5-1.2.3.4.5-1.2.3.4.5-1.2.3.4.5-1.2.3.4.5End"
for set in text:gmatch("([%d%.]+)") do
print(set)
local numbers = {}
for num in set:gmatch("%d+") do
table.insert(numbers, num)
end
print(table.unpack(numbers))
end
prints:
1.2.3.4.5
1 2 3 4 5
1.2.3.4.5
1 2 3 4 5
1.2.3.4.5
1 2 3 4 5
1.2.3.4.5
1 2 3 4 5
1.2.3.4.5
1 2 3 4 5

SSIS: Delete everything to left of a character in a string

Note: this is SSIS not sql server
I am pulling data from a file and some columns have names like this:
1;&count chocula
13;&roger ramjet
123;&mary smith
45678;&john adams
How do I remove the ampersand and everything to the left of it?
I am using the fx transformation for the character.
I thought about finding the character position for the ampersand and then deleting everthing from start to that position but ssis does not have that function. The ampersand can be at any position, I cannot say it is guaranteed to be in position such and such.
Thanks
The RIGHT() function retrieves the last X characters of a string.
RIGHT("13;&roger ramjet",12) = roger ramjet
Above, X equals 12. Of course, twelve won't work for every string. Instead we can calculate X by subtracting the string length from the position of the ampersand.
LEN(MyColumn]) = 16
FINDSTRING([MyColumn],"&",1) = 4
Or put another way...
RIGHT([MyColumn], LEN([MyColumn]) - FINDSTRING([MyColumn],"&",1)) = roger ramjet

Is it possible to check a generic part of a string in Excel?

Given an IP in string form (XX.XXX.YYY.XX) is it possible to check the values of YYY and copy them into an adjacent cell?
The value will either be 2 or 3 characters long and will always come after the 2nd period.
Answered below by Jerry. Thank you again Jerry!
You can extract YYY into a cell with the formula:
=MID(A1,FIND("#",SUBSTITUTE(A1,".","#",2))+1,FIND(".",A1,FIND("#",SUBSTITUTE(A1,".","#",2))+1)-FIND("#",SUBSTITUTE(A1,".","#",2))-1)
Assuming the IP is in A1.
It works for whatever length of IP you have.
EDIT: Some details:
There's some bits of recycled formulas here. SUBSTITUTE(A1,".","#",2) returns XX.XXX#YYY.XX (substitute the 2nd occurrence of the dot in A1 by #) We'll use this in the big formula and let's call it R for the time being.
This turns the formula into:
=MID(A1,FIND("#",R)+1,FIND(".",A1,FIND("#",R)+1)-FIND("#",R)-1)
^-----------^ ^--------------------------------------^
Start | 1 | | 2 |
Length
Much better!
The starting position part:
FIND("#",R)+1 returns the position of the character just after #, so that MID starts with the first Y. Here, the position becomes 7.
The length position part:
FIND(".",A1,FIND("#",R)+1) There's a formula we already used here, FIND("#",R)+1 is 7 so that we have: FIND(".",A1,7). This finds the position of the dot in A1 which is after or at the 7th character. This one gets the value 10.
This one should be familiar and gets the position of # in R, which is 6.
10-6 gives 4, which is one character longer than what we're looking for. (because we're dealing with ranked positions; e.g. Length of the string between 1st and 3nd character of a string is 1 but 3-1 gives 2)
Hence why there's the final -1 part.

Resources