I want to extract only last two numeric values from a string variable in SAS - string

I want to extract only last two numeric values from a string variable and assign it to a new variable. Firstly i have extracted all the numeric values from the string using the code below and assigned it to a new variable but i ultimately want to extract only the last two numeric values so is there any better way to do this.
UI_DUM = input(compress(Prod_Desc,,"kd"),best.);
And one more question is: how to assign a temp variable for doing some manupulation work in SAS?

Here is the code.
You are doing it right, to remove the characters and keeping only digits. The same is being done for variable "temp1"(in the below code).
In the second step, using the length function, to calculate the total length of the string which now contains only digits. In the third step using the substr function to extract the last two digits.
If you want to do it in one statement, "final" variable is the answer.
LENGTH Function - Returns the length of a non-blank character string, excluding
trailing blanks, and returns 1 for a blank character string
compress function with "kd" option - would keep only digits.
COMPRESS(<, chars><, modifiers>)
Modifier - specifies a character constant, variable, or expression in which each non-blank character modifies the action of the COMPRESS function. Blanks are ignored. The following characters can be used as modifiers.
d or D adds digits to the list of characters.
k or K keeps the characters in the list instead of removing them
substr function - Extracts a substring from an argument -
SUBSTR(string, position<,length>)
data _null_;
Test_string="ada13117a1w11da1286s";
temp1=compress(Test_string, , 'kd');
temp2=length(temp1);
temp3=substr(temp1,temp2-1,2);
final=substr(compress(Test_string, , 'kd'),length(compress(temp1))-1,2);
put _all_;
run;
Regarding the temp variable, there is no such one in SAS. Just use any variable name and use the drop statement in final dataset like below;
data test(drop = temp); /*Would work as the temp variable*/
temp= 2*balance;/*just for example*/
/*use the temp in further calculations*/
run;

A somewhat different take:
data want;
set have;
UI_DUM = input(compress(Prod_Desc,,"kd"),best.);
UI_DUM_last2 = mod(UI_DUM,100);
run;
You could do that all in one line of course as well. This uses the numeric modulo function to simply give you the last 2 digits (any number modulo 100 will return the final 2 digits).

Related

Extract substrings from irregular text in Excel cell

I am trying to solve this problem -
If suppose I have text like this in a single column on Excel
#22-atr$$1 AM**01-May-2015&&
$21-atr#10-Jan-2007*6 PM&
&&56-atr#11 PM$$8-Jan-2016*
**4 PM#68-atr#21-Mar-2022&&
and I want to write functions to have separate columns as follows
Can someone help me do that please?
I am trying to solve this problem and the only thing that I was able to arrive to is extracting Month by using =MID(A1,FIND("-",A1)+1,3)
One option for formulae would be using new functions, currently available in the BETA-channel for insiders:
Formula in B1:
=LET(A,TEXTSPLIT(A1,{"#","$","&","*","#"},,1),B,SORTBY(A,IFERROR(MATCH(RIGHT(A),{"r","M"},0),3)),C,HSTACK(TAKE(B,,2),TEXTSPLIT(TEXT(--INDEX(B,3),"YYYY-Mmm-D"),"-")),IFERROR(--C,C))
The idea is to:
Use LET() throughout to store variables;
TEXTSPLIT() the value in column A using all available delimiters into columns and skip empty values in the resulting array;
Then SORTBY() the rightmost character of the resulting three elements using MATCH(). The IFERROR() will catch the data-string;
We can than HSTACK() the 1st and 2nd column with the result of splitting the 3rd element after we formatted to YYYY-MMM-D first;
Finally, the resulting array can be multiplied by a double unary. If not, we replace it with original content from the previous variable.
Notes:
I formatted column C to hold time-value in AM/PM.
I changed the text to hold dutch month-names to have Excel recognize the dates for demonstration purposes. Should work the same with English names.
For fun an UDF using regular expressions:
Public Function GetPart(inp As String, prt As Long) As Variant
Dim Pat As String
Select Case prt
Case 0
Pat = "(\d+-atr)"
Case 1
Pat = "(\d+\s*[AP]M)"
Case 2
Pat = "-(\d{4})"
Case 3
Pat = "-(\w+)-"
Case 4
Pat = "(\d+)-\w+-"
Case Else
Pat = ""
End Select
With CreateObject("vbscript.regexp")
.Pattern = ".*" & Pat & ".*"
GetPart = .Replace(inp, "$1")
End With
End Function
Invoke through =GetPart(0,A1). Choices ares 0-4 and in order of your column-headers.
You can achieve what you wish by applying a few simple transformations.
Replace the #,$,* and & with a common character that is guaranteed not to appear in the data sections (e.g. #)
Replace all occurrences of 2 or more runs of the # character with a single #
Trim the # from the start and end of the string
Split the string into an array using # as the split character (vba.split)
use For Each to loop over the array
In the loop have a set of three tests
Test 1 tests the string for the ocurrence of "-atr"
Test 2 tests the string for the occurence of "-XXX-" where XXX is a three letter month - You then split the date at the - to give an array with Day/Month/Year
Test 3 Tests if the string has ' AM' or ' PM'

Format in Python

I have a list of values as follows:
no column
1. 111-222-11
2. 112-333-12
3. 113-444-13
I want to format the value from 111-222-11 to 111-222-011 and format the other values similarly. Here is my code snippet in Python 3, which I am trying to use for that:
‘{:03}-{:06}-{:03}.format(column)
I hope that you can help.
Assuming that column is a variable that can be assigned string values 111-222-11, 112-333-12, 113-444-13 and so on, which you want to change to 111-222-011, 112-333-012, 113-444-013 and so on, it appears that you tried to use a combination of slice notation and format method to achieve this.
Slice notation
Slice notation, when applied to a string, treats it as a list-like object consisting of characters. The positional index of a character from the beginning of the string starts from zero. The positional index of a character from the end of the string starts with -1. The first colon : separates the beginning and the end of a slice. The end of the slice is not included into it, unlike its beginning. You indicate slices as you would indicate indexes of items in a list by using square brackets:
'111-222-11'[0:8]
would return
'111-222-'
Usually, the indexes of the first and the last characters of the string are skipped and implied by the colon.
Knowing the exact position where you need to add a leading zero before the last two digits of a string assigned to column, you could do it just with slice notation:
column[:8] + '0' + column[-2:]
format method
The format method is a string formatting method. So, you want to use single quotes or double quotes around your strings to indicate them when applying that method to them:
'your output string here'.format('your input string here')
The numbers in the curly brackets are not slices. They are placeholders, where the strings, which are passed to the format method, are inserted. So, combining slices and format method, you could add a leading zero before the last two digits of a column string like this:
'{0}0{1}'.format(column[:8], column[-2:])
Making more slices is not necessary because there is only one place where you want to insert a character.
split method
An alternative to slicing would be using split method to split the string by a delimiter. The split method returns a list of strings. You need to prefix it with * operator to unpack the arguments from the list before passing them to the format method. Otherwise, the whole list will be passed to the first placeholder.
'{0}-{1}-0{2}'.format(*column.split('-'))
It splits the string into a list treating - as the separator and puts each item into a new string, which adds 0 character before the last one.

How to print a number within a string in matlab

I would like to use the command text to type numbers within 57 hexagons. I want to use a loop:
for mm=1:57
text(x(m),y(m),'m')
end
where x(m) and y(m) are the coordinates of the text .
The script above types the string "m" and not the value of m. What am I doing wrong?
Jubobs pretty much told you how to do it. Use the num2str function. BTW, small typo in your for loop. You mean to use mm:
for mm=1:57
text(x(mm),y(mm),num2str(mm));
end
The reason why I've even decided to post an answer is because you can do this vectorized without a loop, which I'd also like to write an answer for. What you can do place each number into a character array where each row denotes a unique number, and you can use text to print out all numbers simultaneously.
m = sprintfc('%2d', 1:57);
d = reshape([m{:}], 2, 57).';
text(x, y, d);
The (undocumented!) function sprintfc takes a formatting specifier and an array and creates a cell array of strings where each cell is the string version of each element in the array you supply. In order to ensure that the character array has the same number of columns per row, I ensure that each string takes up 2 characters, and so any number less than 10 will have a blank space at the beginning. I then convert the cell array of strings into a character array by converting the cell array into a comma-separated list of strings and I reshape the matrix into an acceptable form, and then I call text with all of the pairs of x and y, with the corresponding labels in m together on the screen.

Extract specific data from field in Access table

Every 2 weeks I need to import an excel file into an access 2007 database. The 2nd cell in the excel file A2 contains always different information. It always start with AS OF PAY PERIOD XX, where XX stands for the pay period. When imported into an access table I need to extract the pay period and it seems that the pay period is always in position 18, a payperiod is always 2 chars in length. Is there an easy way with a string function to extract that information. Thanks.
http://office.microsoft.com/en-us/access-help/mid-function-HA001228881.aspx
Returns a Variant (String) containing a specified number of characters from a string.
Syntax
Mid(string, start [, length ] )
The Mid function syntax has these arguments :
string - Required. string expression from which characters are returned. If string contains Null, Null is returned.
start - Required. Long - Character position in string at which the part to be taken begins. If start is greater than the number of characters in string, Mid returns a zero-length string ("").
length - Optional. Variant (Long) - Number of characters to return. If omitted or if there are fewer than length characters in the text (including the character at start), all characters from the start position to the end of the string are returned.
Use the MID statment within a query, a SQL statement, or on the field data element from a recordset process.

Group digits in currency and remove leading zeroes

I want to know how to do
digit grouping
when I have value for money for example 3000000 ( 3million) i want to print 3.000.000 on the screen (there is a dot every three character from the last character)
Remove zeroes in front of value
when I select a value from table and print it on the screen, the value get padded with zeroes automatically: e.g. 129 becomes 0000129
The WRITE statement allows you to specify a currency. Example:
DATA price TYPE p DECIMALS 2.
price = '3000000'.
WRITE: / price CURRENCY 'USD'.
Note that this does not interpret the number itself, but just adds commas and dots at certain positions depending on the currency you specify. So in the event you have an integer with the value of 3000000 and you write it with currency USD the result will be 30.000,00.
I suggest you read the F1 help information on the WRITE statement, because there are a lot more options besides this one.
--
Removing leading zeroes is done by using a conversion routine.
The CONVERSION_EXIT_ALPHA_INPUT will add leading zeroes and CONVERSION_EXIT_ALPHA_OUTPUT will remove them.
It is possible to add these routines to a Domain in the dictionary, so the conversion will be done automatically. For example the MATNR type:
DATA matnr TYPE matnr.
matnr = '0000129'.
WRITE: / matnr.
This will output 129 because the Domain MATNR has a conversion routine specified.
In the case of a type which does not have this, for example:
DATA value(7) TYPE n.
value = '0000129'.
WRITE: / value.
The output will be 0000129. You can call the CONVERSION_EXIT_ALPHA_OUTPUT routine to achieve the output without leading zeroes:
DATA value(7) TYPE n.
value = '0000129'.
CALL FUNCTION 'CONVERSION_EXIT_ALPHA_OUTPUT'
EXPORTING
input = value
IMPORTING
output = value.
WRITE: / value.
Please also note that output conversion for numberlike types - triggered by the WRITE statement - is controlled by a property in the user master data.
Decimal separator and digit grouping should be configured there.
You could check this in the user master transactions e.g. SU01 or SU01D.
For removing the zero padding use NO-ZERO statement. For the thousand separator I do not see any problem because it is a standard way ABAP prints values of type P. Here is a sample code.
REPORT ZZZ.
DATA:
g_n TYPE n LENGTH 10 VALUE '129',
g_p TYPE p LENGTH 12 DECIMALS 2 VALUE '3000000'.
START-OF-SELECTION.
WRITE /: g_n, g_p.
WRITE /: g_n NO-ZERO, g_p.
This produces the output.
000000129
3.000.000,00
129
3.000.000,00
For removing leading zeros, you can do the following:
data: lv_n type n length 10 value '129'.
shift lv_n left deleting leading '0'.

Resources