Pentaho convert string to integer with decimal point - string

I am importing text values into a transformation using a Fixed Width input step. Everything is coming in as a string. I want to convert some of the string values to integers with a decimal point at a specified spot. Here are some examples of the before (left hand side) and expected results (right hand side):
00289 --> 0028.9
01109 --> 0110.9
003201 --> 0032.01
I've tried numerous combinations of the Format mask in a Select Values step (meta data tab) but I can't get the values I'm looking for.
Can you anyone tell me what combination I can try for* Type/Length/Precision/Format/Encoding/Decimal/Group* attributes for these fields to get the desired output?

Have you tried another step the reach your goal? You can try to use e.g. User Defined Java Expression setting it in this way:
Java expression: new java.math.BigDecimal(text.substring(0,4) + "." + text.substring(4,text.length()))
Value type: BigNumber
But this will convert your input to:
00289 --> 28.9
01109 --> 110.9
003201 --> 32.01
Because its output is BigNumber format. BigNumber or Number format can be used for decimal numbers. You cannot use Integer for decimals because it has no decimal part.
If you want a String output leave out the new java.math.BigDecimal() part from the expression above and set Value type to String. It will produce these results:
00289 --> 0028.9
01109 --> 0110.9
003201 --> 0032.01
This is the one suggestion. Of course there are another ways of how to reach your goal.

Related

How to stop Python Pandas from converting specific column from int to float

trying to out put a dataframe into txt file (for a feed). a few specific columns are getting automatically converted into float instead of int as intented.
how can i specific those columns to use int as dtype?
i tried to output the whole dataframd as string and that did not work.
the columns i would like to specify are named [CID1] and [CID2]
data = pd.read_sql(sql,conn)
data = data.astype(str)
data.to_csv('data_feed_feed.txt', sep ='\t',index=True)
Based on the code you provided, you turn all of your data to strings just before export.
Thus, you either need to turn some cols back to desired type, such as:
data["CID1"] = data["CID1"].astype(int)
or not convert them in the first place.
It is not clear from what you provided why you'd have issues with ints being converted to floats.
this post provides heaps of info:
stackoverflow.com/a/28648923/9249533

How set a standard format specifier to be used throughout a module when using f-strings?

I am using f-strings for formatting strings and injecting variable values into the string, is there a way to set a format-spec for an entire module in python?
I am aware of Standard Format Specifiers which can be used to specify formats for each string, but how do I do it at once for the whole module?
Eg:
f""" Some random string {value1}, some more text {value2:.2f} ... """
Here I am specifying the format for value2, but I want to set a format for all globally.
f"""% profits are {profit}"""
Had I set format spec to {profit:.2f} this will be set to two decimal places, but I want to set that :.2f globally so that number decimal places can be changed with one variable update.
Something like format-spec = ':.2f' and all the f-string injected values should be displayed as floats with two decimals (if its a numbers).
If I understand you correctly, you want to define a string format one time that is usable throughout a script. I would create a function, callable from anywhere in your script that produces the formatted string as shown below:
def create_string(v_str: str, v_num: float) -> str:
return f""" Some random string {v_str}, some more text {v_num:.2f} ... """
You can then use the create_string function from anywhere in your script and produce the desired output as illustrated below:
print(create_string('Some words here', 2.56))
which yields:
Some random string Some words here, some more text 2.56 ...

Convert varchar string to Currency format in db2 SQL

I have a column from which i have to extract String and then format it back to US currency format with 2 decimal places.
For example :
Column value : {tag}0000020000890|
From this, I have to match the tag and extract 20000890, and format it to 200,008.90
I have extracted the part with below code:
LTRIM(REGEXP_SUBSTR('match pattern', 1,1,'i',,1), '0')
Where match pattern is '\{tag\}(.*?)\|'
With this, I am able to extract 20000890
And then I tried the below to_char and to_number function on top of it to format as comma separated currency with 2 decimal points.
to_char(ltrim(Regexp_substr('match pattern',1,1,'i',1),'0'), '99G999G999D99')
But this throws below error:
Sql error -20447, sqlstate 22007 sqlerrmc 99G999G999D99
Sysibm.Varchar-format
Then I tried,
to_char(to_number(ltrim(Regexp_substr('match pattern',1,1,'i',1),'0')), '99G999G999D99')
But this also throws error:
Sql error -20476, sqlstate 22018 sqlermc DECFLOAT_FORMAT; 99G999G999D99
I'm not sure what causes this error.
The format that you try to use is supported starting from V11.5 only.
TO_CHAR V11.5
TO_CHAR V11.1
Compare the Table 2. Format elements for decimal floating-point to varchar table from both links.
Moreover, you must cast a string to a numeric value in the 1-st parameter of TO_CHAR:
SELECT TO_CHAR(DECFLOAT(REGEXP_SUBSTR(V, '\{tag\}(.*?)\|', 1, 1, 'i', 1)), '99,999,999.99')
FROM (VALUES '{tag}0000020000890|') T(V);
Take a look at VARCHAR_FORMAT. It is the function TO_CHAR is mapped to. The group separator is not G, but "," or ".". Basically, you have to replace your formatting string 99G999G999D99 with something like 99,999,999.99.
The Db2 documentation has more examples on that.

Error converting string feature to numeric in Azure ML studio

QuotedPremium column is a string feature so I need to convert it to numeric value in order to use algorithm.
So, for that I am using Edit Metadata module, where I specify data type to be converted is Floating Point.
After I run it - I got an error:
Could not convert type System.String to type System.Double, inner exception message: Input string was not in a correct format.
What am I missing here?
As mentioned in comments, you must change column where numbers are handled as text to numeric type data and it shouldn't have any null values. Now answering the question of how to substitute NULL's in data using ML studio and converting to numeric type.
Substitute NULL's in data
Use Execute R Script module for that, and add this code in it.
dataset1 <- maml.mapInputPort(1); # class: data.frame
dataset1[dataset1 == "NULL"] = 0; # Wherever cell's value is "NULL", replace it with 0
maml.mapOutputPort("dataset1"); # return the modified data.frame
Image for same:
Convert to numeric data
As you have added in your answer, this can be done using the Edit Metadata module.

extracting a string based on regex using xslt 1.0

I need to remove a particular string from a tag using xslt 1.0. The string is random and can appear anywhere.The only way to identify the string is that it is followed by either "tt" or "Tt" or "tt." or "Tt."
Can anybody help me with the code snippet which i can use to achieve this.
For example
<Page>9 tt., 407-415</Page>
Expected output (remove 9tt.)
<Page>, 407-415</Page>
<Page>425 Tt (approx.)</Page>
Expected output (remove 425 Tt)
<Page> (approx.)</Page>
<Page>055302, 8 tt.</Page>
Expected output(remove 8Tt.)
<Page>055302, </Page>

Resources