I have a comma separated list of values generated from an excel sheet. (Numbers and RTL Characters)
Having these values in columns: 1 | 2 | 3 | 4 | 5
would yield me the output of 1,2,3,4,5
But the issue arises when I have RTL characters (Persian/Arabic) in my columns: 1 2 ب الف and a 5 in the end.
Now the output becomes 1,2الف, ب, 5
Since my columns can have multiple sets of RTL characters it can really mess up the output to the point that it's no more trivial to fix it by substituting several inputs.
What are my options to produce a csv file with the right order?
Tools I used where javascript and excel and both had the same issue.
If your purpose is to only display the CSV for human eye, you can add RIGHT-TO-LEFT MARK () before each number:
1, 2, ب, الف, 5
1, 2, ب, الف, 5
Note that these characters may drive crazy any tool you use to parse the CSV.
I think your CSV file already has the right order. In the text you pasted in the question:
1,2الف, ب, 5
The "1" is the first character in the string, and the "5" is the last. It just doesn't seem that way to you because the first half of the string (1,2) is rendering LTR whereas the second half of the string (الف, ب, 5) is rendering RTL.
Related
Using Pentaho data integration (Kettle), I read a long string from a text file:
a, 1, 2, b, 3, 4, c, 5, 6, ...
Is there any PDI/Kettle steps or method to split this string to become an n column table format like below (the column name can be define freely):
column1
column2
column3
a
1
2
b
3
4
c
5
6
the above just a simplify example, my real case is having different separator character and the column number (n) is bigger. But I just want to get the main problem solve first.
I have prepared a SOLUTION for you. In my solution I have set N=3, But you can set as many as you want. Also, you require to input Column name in 'Row denormaliser' step if you want to set N =3/4/5/N.
Although , you can set column name dynamic (if you want) using 'Meta Data Injection' step easily.
I didn't understand about "different separator character". If you have different separators on the same line, such as a comma and a semicolon, then this is a tricky task for the PDI process. Then you need to cast all delimiters to the same type first. For example, in Notepad ++, make a replacement. Notepad ++ does a good job with large CSV files.
Further in the PDI there is a standard separator component "Split Fields".
This is an exmaple of the string, and it can be longer
1160752 Meranji Oil Sats -Mt(MA) (000600007056 0001), PE:Toolachee Gas Sats -Mt(MA) (000600007070 0003)GL: Contract Services (510000), COT: Network (N), CO: OM-A00009.0723,Oil Sats -Mt(MA) (000600007053 0003)
The result needs to be column1 600007056 column2 600007070 column3 600007053
I am working in Spotfire and creating calclated columns through transformations as I need the columns to join to other data sets
I have tried the below, but it is only picking up the 1st 600.. number not the others, and there can be an undefined amount of those.
Account is the column with the string
Mid([Account],
Find("(000",[Account]) + Len("(000"),
Find("0001)",[Account]) - Find("(000",[Account]) - Len("(000"))
Thank you!
Assuming my guess is correct, and the pattern to look for is:
9 numbers, starting with 6, preceded by 1 opening parenthesis and 3 zeros, followed by a space, 4 numbers and a closing parenthesis
you can grab individual occurrences by:
column1: RXExtract([Amount],'(?<=\\(000)6\\d{8}(?=\\s\\d{4}\\))',1)
column2: RXExtract([Amount],'(?<=\\(000)6\\d{8}(?=\\s\\d{4}\\))',2)
etc.
The tricky bit is to find how many columns to define, as you say there can be many. One way to know would be to first calculate a max number of occurrences like this:
maxn: Max((Len([Amount]) - Len(RXReplace([Amount],'(?<=\\(000)6\\d{8}(?=\\s\\d{4}\\))','','g'))) / 9)
still assuming the number of digits in each column to extract is 9. This compares the length of the original [Amount] to the one with the extracted patterns replaced by an empty string, divided by 9.
Then you know you can define up to maxn columns, the extra ones for the rows with fewer instances will be empty.
Note that Spotfire always wants two back-slash for escaping (I had to add more to the editor to make it render correctly, I hope I have not missed any).
Is there a way to imbed a number in a text string, while still rounding it to millions and showing an 'm'?
If it was just formatting, I would use:
#.#,, 'm'
and in text I would use:
text(ref, "#.#,,")
but how can you combine the two? the below does not work
text(ref, "#.#,,m")
=TEXT(ROUND(ref,-6)/1000000,"#,##0")&"m"
This would round a number up to millions and then replace the 6 0's with the letter 'm'. e.g. 4,658,458,685 would become:
4,658m
Edit:
The following works as requested with everything inside the TEXT function:
=TEXT(ref, "#.#,," & """m""")
I have an excel-file that i need to save to a .csv or .txt to create a specific formatted file to a software I'm using. Two of the columns in the .csv or .txt contains data with single and double digit numbers. When saving excel-file to .csv or .txt these columns will be separated with the according separating value (semicolon, tab, aso.)
What I'm looking for is how to add a space in front of the single digit number so that it aligns to the right properly with the double digit number. I have tried to figure this out in the custom number formatting but I always end up with spaces both in front of the double digit and single digit.
To try and illustrate (left side is standard csv, right side is what I'm looking for):
;14;3; --> ;14; 3;
;12;22; --> ;12;22;
;13;5; --> ;13; 5;
The displayed format (cell number formatting) is exported as the CSV element value. Use a number format of [>9]0;_(0 to add a prefacing space to single digit values.
col1,col2,col3
AA, 2,2.00
BB, 3,3.00
CC, 4,4.00
DD, 5,5.00
EE, 6,6.00
FF, 7,7.00
GG, 8,8.00
HH, 9,9.00
II,10,10.00
JJ,11,11.00
KK,12,12.00
LL,13,13.00
MM,14,14.00
The middle field receive the custom number format.
I am trying to convert a single column of numbers in excel to multiple depending on the content.
e.g. Table 1 contains 1 column that contains 1 or more numbers between 1 and 11 separated with a comma. Table 2 should contain 11 columns with a 1 or a 0 depending on the numbers found in Table 1.
I am using the following formula at present:
=IF(ISNUMBER(SEARCH("1",A2)),1,0)
The next column contains the following:
=IF(ISNUMBER(SEARCH("2",A2)),1,0)
All the way to 11
=IF(ISNUMBER(SEARCH("11",A2)),1,0)
The problem with this however is that the code for finding references to 1 also find the references to 11. Is it possible to write a formula that can tell the difference so that if I have the following in Table 1:
2, 5, 11
It doesn't put a 1 in column 1 of Table 2?
Thanks.
Use, for list with just comma:
=IF(ISNUMBER(SEARCH(",1,", ","&A2&",")),1,0)
If list is separated with , (comma+space):
=IF(ISNUMBER(SEARCH(", 1,", ", "&A2&",")),1,0)
A version of LS_dev's answer that will cope with 0...n spaces before or after each comma is:
=IF(ISNUMBER(SEARCH(", 1 ,",", "&TRIM(SUBSTITUTE(A2,","," , "))&" ,")),1,0)
The SUBSTITUTE makes sure there's always at least one space before and after each comma and the TRIM replaces multiple spaces with one space, so the result of the TRIM function will have exactly one space before and after each comma.
How about using the SUBSTITUTE function to change all "11" to Roman numeral "XI" prior to doing your search:
=IF(ISNUMBER(SEARCH("1",SUBSTITUTE(A2, "11", "XI"))),1,0)
If you want to eliminate "11" case, but this is all based on hardcoded values, there should be a smarter solution.
=IF(ISNUMBER(SEARCH(AND("1",NOT("11")),A2)),1,0)