How to recode strings in Stata? - string

I would like to change the values in my Nationality variable. For some countries, there is a space after the name of the country ("Germany" and "Germany "), so that they don't appear to be the same country when looking at frequencies. I tried using "encode" to make the values numeric, so that I could use "recode" to change the values, but that gave me the error "unknown el Germany in rule". I attached a screenshot.
Screenshot of data

recode is for numeric variables only, you can use replace.
replace Nationality = "Belgium" if Nationality == "Belgium "
However, a more convenient way for the problem at hand is strtrim(), which removes leading and trailing spaces.
replace Nationality = strtrim(Nationality)

Related

How to recover second element separate by character with index equiv

I got a table in Excel like this:
I used index with double equiv to have only the price for column A, the price for column B, the price for column C, I did this :
=INDEX($J$1:$L$4;EQUIV($F6;$J$1:$J$4;0);EQUIV(Z$24;$J$1:$L$1;0))
But I would like to have only the value at the right of ";" but I don't know how to combine with my index and equiv to have only the value 111,1456,44455.
I have this:
EQUIV() is the french name for MATCH() am I right?
If so just use a wildcard-match:
=MATCH("*;"&$F6,$J$1:$J$4,0)
Or the french equivalent:
=EQUIV("*;"&F6;$J$1:$J$4;0)
Your question is not quite clear, I am assuming you have a multiple values separated by semicolon ";" in column Price and now you want a portion of it, in this case only Right, if that is so, here is your solution:
Price
112233;50.99
223344;15.50
3344;150.5
to get the left side, use
=LEFT(C2,LEN(C2)-FIND(";",C2)-1)
here you have to subtract -1 because we don't want to include the semicolon at the end
to get the right side, use
=RIGHT(C2,LEN(C2)-FIND(";",C2))
Result:

MS Excel Forumla assistance

I have a cell I need to split into 2 cells.
Data Sample: Note: All Cells are formatted as TEXT
"3851v61_18.005_ Have the anchors for all suspended scaffolding system suspension lines and separate vertical lifelines been verified? "
Data Sample 2: Parent_ID
Steps:
Need to check to see if the cell value starts with number.
Also, If it contains a special character ("_") if may have more than 1.
Display cell #1 = just the ID number containing the underscore(s).
Display cell #2 - Just the text right of the underscore. However, if the original cell only starts with Alpha characters then display the actual value. ie. Parent_Id
Strip off any erroneous underscores left hanging.
Expected results:
Cell #:
"3851v61_18.005" (ID Number portion of the Text)
"Have the anchors for all suspended scaffolding system suspension lines and separate vertical lifelines been verified?
This is what I have so far: (If it does not start with a number, then return the value of the cell, else continue with the equation)
`=`IF(NUMBERVALUE(LEFT(C321,1))>=1,IFERROR(LEFT(C321, FIND("_",C321)-1), C321),FALSE)`
=IFERROR(RIGHT(C321,LEN(C321)-FIND("_",C321)), C321)`
If the Underscore count is more than one need to include it in the entire number and strip off the text after the last underscore in Cell 1. At the same for the right of the Underscore to display the text after underscore in Cell 2.
Thank you for any assistances offered.
I think I understand but am not 100% sure.
Try something like the below to get the full string (if it starts with something that isn't a number) or the string up to the last underscore (if it does start with a number):
=IF(NOT(ISNUMBER(NUMBERVALUE(LEFT($D1,1)))), $D1,
LEFT($D1, FIND("!!!", SUBSTITUTE($D1, "_", "!!!",
LEN($D1)-LEN(SUBSTITUTE($D1, "_", ""))))-1))
Then in a similar fashion try something like the below to get the full string (if it starts with something that isn't a number) or the string to right of the last underscore (if it does start with a number):
=IF(NOT(ISNUMBER(NUMBERVALUE(LEFT($D1,1)))), $D1,
RIGHT($D1, LEN($D1)-FIND("!!!", SUBSTITUTE($D1, "_", "!!!",
LEN($D1)-LEN(SUBSTITUTE($D1, "_", ""))))))
For example:

Find specific characters and return the next value in the cell using Excel Formula

I am not sure where to begin with the formula as I have gotten myself so confused with everything. I have a cell the contains "PON " or "PON: " or "PON = " then the actual PON (Example: PON 123467) I want to formula to return 123467 in the cell.
Examples What I want returned
I have PON 123467 for shoes 123467
I have PON: 234567-AB for food 234567-AB
I have PON - 569874-Weird for accessories 569874-Weird
I have PON = DOG-564-987 for dog food DOG-564-987
I am currently using Excel 365
Filterxml() will give you best companion here in this case. Try-
=FILTERXML("<t><s>"&SUBSTITUTE(FILTERXML("<t><s>"&SUBSTITUTE(A1," for","</s><s>")&"</s></t>","//s[1]")," ","</s><s>")&"</s></t>","//s[last()]")
Using FILTERXML, and testing for a substring following PON, you can try:
=FILTERXML("<t><s>"&SUBSTITUTE(TRIM(A1)," ","</s><s>") & "</s></t>","//s[contains(.,'PON')]/following-sibling::*[string-length(.)>2][1]")
Note that FILTERXML solution will cause a PON that is solely numeric, but with a leading zero, to drop the leading zero. Unfortunately, the xPath implementation in that function does not include the string() function
If dropping the leading zero might be a problem, you can add a character to the node that will force the number to be seen as a string. In the modified formula below, I use the unicode zero-width space, but there are others you can use. Note that this will count as a character for the string=length function, so be sure to maintain the >2 parameter:
=FILTERXML("<t><s>"&SUBSTITUTE(TRIM(A1)," ","</s><s>"&UNICHAR(8203)) & "</s></t>","//s[contains(.,'PON')]/following-sibling::*[string-length(.)>2][1]")
Because of the variablity in your data, that sometimes there are extraneous space-separated substrings between PON and your desired extract, the xpath:
locates the substring PON
returns all subsequent siblings that have a string-length of more than two (adjust if necessary)
returns the first sibling that meets that criterion.
You might try this formula.
=TRIM(LEFT(MID(A2,FIND(#{1,2,3,4,5,6,7,8,9},A2),100),FIND(" ",MID(A2,FIND(#{1,2,3,4,5,6,7,8,9},A2),100))))
It extracts the text between the first number and the first space following that number. The size of that extract is limited to 100 characters.

Excel: Formula to combine potentially incomplete address components into an address string with appropriate separators

I need to intake U.S. address data by street, city, state, and ZIP code and output an address in the following format:
1502 Bruce Rd, Oreland, PA 19075
without any superfluous spaces.
Any component of the address can be missing, so each of the four elements binarily exists or doesn't, which results in a total of 4! = 24 permutations of having or not having each of the four elements. Excluding the case where they're all missing, that leaves 23 permutations to deal with.
There seem to be lots of questions about splitting addresses apart, but none about combining them back together, especially when you need to either include or exclude the spacing based on what comes after. What's the best way to handle this? Bonus points for a way that's easily extensible (for example, I won't be surprised if later we have to include a unit field between street and city at some point in the future).
As I was writing up this question, I came up with a couple solutions, where columns A, B, C, and D are street, city, state, and ZIP code, respectively.
First is this one:
=TRIM(
IF(A1="", "", A1 & IF(B1&C1&D1="", "", ", ")) &
IF(B1="", "", B1 & IF( C1&D1="", "", ", ")) &
IF(C1="", "", C1 & IF( D1="", "", " " )) &
IF(D1="", "", D1)
)
which works as follows:
See if the element exists and, if so, include it.
If the element exists, see if anything exists after it and, if so, add the spacer that follows it. (This works because spacers seem be determined by what is before the spacer.)
And the whole thing's wrapped in a TRIM to get rid of unnecessary user-input spaces.
To check if anything after an element exists, I concatenated the following fields and checked to see whether that concatenation was a blank string.
Because of the use of concatenation here, I thought it might be easier to use TEXTJOIN and came up with this:
=TEXTJOIN(" ", TRUE, TEXTJOIN(", ", TRUE, TRIM(A1), TRIM(B1), TRIM(C1)), TRIM(D1))
which works as follows:
The inner TEXTJOIN combines the first three elements (street, city, and state) with the common delimiter of a comma + a space.
Once that's done, the outer TEXTJOIN combines the result of that with the ZIP code using a space as a delimiter.
TRIM used again as above.
These seem to cover all 23 cases and aren't too hard to extend if additional fields need to be added, although I'm definitely open to any better solutions you might have.

Find space in a list

I have a list (below) that is the output from an Excel table. The Excel table has 3 columns: Month, Col1, Col2 and the output format is CSV.
January,630648,97646 February,576204,87616 March,998287,142008 April,782340,118664 May,1678775,205862 June,1976671,295065 July,3349937,438844 August,0,0 September,0,0 October,0,0 November,0,0 December,0,0
I want to display this as an HTML table. I tried using List and Array functions, but could not achieve the desired result. How can I change the empty space to a delimiter, or is there a better way to do this?
Update from comments:
I am using cfspreadsheet to read an excel table with 3 columns
<cfspreadsheet src="../../../../file.xlsx"
action="read"
name="myquery"
sheetname="2014"
rows="6-17" columns="10,11,12"
format="csv"
columnnames="Month,Col1,Col2"
headerrow="4"
excludeheaderrow="false">
When I used the replace function; it did not do anything to the list. I then tried ListChangeDelims as suggested. However, that just changed the , to ; but the space remained as it is.
A list is just a string, so use one of the string functions.
replace(myString," ",";","all")
will replace all the spaces in the string with semi-colons.
You could also use ListChangeDelims() and convert the spaces to the delimiter that you want.
ListChangeDelims(list, new_delimiter [, delimiters, includeEmptyValues ])
So, this would change spaces and commas to semi-colons:
ListChangeDelims(myList,";",", ")
It's important to have both the space and the comma in the delimiters attribute.

Resources