I first applied destring to an ID variable (with 17 digits). They are destrung but then they are shown in scientific notation. So I tried the command format %20.0f. Now all digits are shown but the last 2-3 digits are now changed.enter image description here
Stata can only hold numeric variables with up to 16 digits.
Your best option is probably to keep the ID as a string.
The command format only affects how a data point is displayed to humans, not how it is actually stored.
This is to complement the answer by #TheIceBear.
format never changes values. The problem is that your string is too big even for its numeric equivalent to be held exactly in a double, except occasionally.
clear
set obs 5
gen id = 17*"9" in 1
replace id = 16*"9" + "6" in 2
replace id = 16*"9" + "2" in 3
replace id = 15*"9" + "88" in 4
replace id = 15*"9" + "84" in 5
format id %20s
destring id, gen(nid)
format nid %20.0f
list
+----------------------------------------+
| id nid |
|----------------------------------------|
1. | 99999999999999999 100000000000000000 |
2. | 99999999999999996 100000000000000000 |
3. | 99999999999999992 100000000000000000 |
4. | 99999999999999988 99999999999999984 |
5. | 99999999999999984 99999999999999984 |
+----------------------------------------+
Related
I am new to Stata and i assume this is a beginner question. Yet I have just spent the last hour searching the internet for an answer to no avail!
I am using World Bank GDP data (imported from a csv file) and the data is in the string format. When I destring, the GDP data that contains decimal places gets ignored and simply comes out as a big number.
destring yr*, replace ignore("..")
Here is a sample of my data:
yr2016
205276172134.901
..
13397100000
When I run the command I posted, it transforms to:
yr2016
2.053e+14
1.340e+10
As you can see the .901 was tacked into the number instead of being perceived as a decimal space.
I have tried:
set dp period
But it didn't work.
You just need to set the format of the converted variable:
clear
set obs 1
generate string = "205276172134.901"
destring string, generate(numeric)
list
+------------------------------+
| string numeric |
|------------------------------|
1. | 205276172134.901 2.053e+11 |
+------------------------------+
format numeric %18.0g
list
+-------------------------------------+
| string numeric |
|-------------------------------------|
1. | 205276172134.901 205276172134.901 |
+-------------------------------------+
Type help format for more information.
The problem is that the ignore() option is removing every instance of a . in the string variable, Stata is not searching for a sequence of two consecutive ... There is no need to use the ignore option in this case. Try destring var, replace force and allow Stata to set rows with .. to missing.
I have a person identification number variable in a panel dataset that is of string type with 19 characters (str19). Whenever I convert it into numeric using the destring command I lose precision because it is converted into either double (max 16 characters) or float, meaning that the ID numbers no longer identify respondents uniquely. I need it to be numeric in order to treat the data as panel (xt commands). What can I do?
The best way forward I can think of is to use egen's group() function to create identifiers. You don't provide a data or code example, but this illustrates the point.
. clear
. set obs 1
number of observations (_N) was 0, now 1
. gen strid = "1234567890123456789"
. egen numid = group(strid), label
. list
+-------------------------------------------+
| strid numid |
|-------------------------------------------|
1. | 1234567890123456789 1234567890123456789 |
+-------------------------------------------+
. list, nolabel
+-----------------------------+
| strid numid |
|-----------------------------|
1. | 1234567890123456789 1 |
+-----------------------------+
Note that this is documented: see this FAQ.
I'm trying to do a match-and-calculate formula in Excel (or in Numbers for Mac, is the same for me: I try them both as they seem equal, also function names are equal!).
This is what I have:
| 1 | 2 | 3 |
|-----------+-----------+-----------|
| Category |other stuff| duration |
|-----------+-----------+-----------|
| A + .... ... + 00:01:23 |
|-----------+-----------+-----------|
| A + .... ... + 00:30:19 |
|-----------+-----------+-----------|
| B + ......... + ......... |
|-----------+-----------+-----------|
| A + .... ... + 00:22:12 |
... ... ....
So, in column 3 I have a duration in time in this format "hh:mm:ss" and in column 1 are stored all of my categories.
I want to search for all rows in my table that are matching with the category "A" in column 1 and take the relative column 3, splitting the string and converting chars to numbers (in particular I'm interested in converting them to secs, so hh*3600+mm*60+ss) and finally sum up all these values. Is it possible?
I'm new with Excel and Numbers, but I'm pretty familiar with coding in programming languages generally: this is what I'd do in programming:
global_secs=0;
for(row r=top to end){
if(r.get_column(1).content_equals("A")){
cell c=r.get_column(3);
string=split(c.get_content(),":")
global_secs+=int(string[1])*3600+int(string[2])*60+int(string[3])
}
}
Is there a way to achieve this in Excel sheet (or Numbers)?
I'd like to do all of this in one, or more, formula only in Excel or Numbers.
One more thing: I do not want to change cells format because this should be an automatic process without human interaction, so unless there is a function to change a range of cells format dynamically I prefer not to do that (I know I can make "duration" as format and sum up without converting to integer, but originally my data is in hh:mm:ss format)
Thanks so much!
The formula you are looking for is
=SUMIF(A2:A5,"A",C2:C5)
The easiest way to get the result in seconds would have been to format the cell as [ss] in Custom category. But as you don't want to do formatting , the other way could be
=HOUR(result) * 3600 + MINUTE(result) * 60 + SECOND(result)
So formula becomes
=HOUR(SUMIF(A2:A5,"A",C2:C5)) * 3600 + MINUTE(SUMIF(A2:A5,"A",C2:C5)) * 60 + SECOND(SUMIF(A2:A5,"A",C2:C5))
See image for referecne
Looks like a matrix formula
=SUM(N($A$2:$A$8="A")*$B$2:$B$8)
where column A contains the category and column C the duration. Note you need to press ctrl shift enter to make it work.
To convert the result to seconds, an alternative approach to #Mrig' solution would be to format the result and convert it back to a number, i.e.
=VALUE(TEXT(SUM(N($A$2:$A$8="A")*$B$2:$B$8),"[ss]"))
I want to return a label(s) based on an intersection of a Row and Column equal to "Yes".
| Location |
ID | Tool | Wall | Bin | Toolbox | Count
---+--------+------+-----+---------+-------
1. | Axe | YES | | | 1
2. | Hammer | | | YES | 5
3. | Pliers | | | YES | 2
4. | Nails | | YES | | 500
5. | Hoe | YES | | | 2
6. | Screws | | YES | | 200
7. | Saw | YES | | | 3
What's in Toolbox? (Results wanted)
Axe,Wall, 1
Hammer, Toolbox, 5
Pliers,Toolbox, 2
Nails,Bin, 500
Hoe, Wall, 2
Screws, Bin, 200
Saw, Wall, 3
I also want to be able add Tools and Locations?
Without using VBA, this is going to be a bit of a pain, but workable if you don't mind helper columns. I don't advise trying to do this in a single Array Formula, because text strings are hard to work with in Array formulas. That is - if you have an array of numbers, you can turn that into a single result a lot of ways (MIN, MAX, AVERAGE, SUM, etc.). However with an array of strings, it's harder to work with. Particularly, there's no easy way to concatenate an array of strings.
So instead, use a helper column. Assume column A = toolname, column B = a check for being on the wall, column C = a check for being in the bin, column D for being in the toolbox, and column E for the number available.
FORMATTING SIDE NOTE
First, I will say that I recommend you use TRUE/FALSE instead of "yes"/"no". This is because it is easier to use TRUE / FALSE within Excel formulas. For example, if you want to add A1 + A2 when A3 = "yes", you can do so like this:
=IF(A3="yes",A1+A2)
But if you want to check whether A3 = TRUE, this is simplified:
=IF(A3,A1+A2)
Here, we didn't need to hardcode "yes", because A3 itself will either be TRUE or FALSE. Also consider if you want to make a cell "yes" if A3 > 5, and otherwise be "no". You could do it like this:
=IF(A3>5,"yes","no)
Or, if you used TRUE/FALSE, you could simply use:
=A3>5
However, I'll assume that you keep the formatting you currently have (I would also recommend you just have a single cell that says either "toolbox"/"bin" etc., instead of 4 columns where 1 says "yes", but we'll also assume that this needs to be this way).
BACK TO YOUR QUESTION
Put this in column F, in F2 for the first cell:
=Concatenate(A2," ",INDEX($B$1:$D$1,MATCH("yes",B2:D2,0))," ",E2)
Concatenate combines strings of text into a new single text string. You could also use &; like: A2 & " " etc., but with this many terms, this is likely easier to read. Index looks at your header row 1, and returns the item from the first column which matches "yes" in the current row.
Then put the following in F3 and drag down:
=Concatenate(F2," ", A2," ",INDEX($B$1:$D$1,MATCH("yes",B2:D2,0))," ",E2)
This puts a space in between the line above and the current line. If instead you want to make each row appear after a line-break, use this:
=Concatenate(F2,CHAR(10), A2," ",INDEX($B$1:$D$1,MATCH("yes",B2:D2,0))," ",E2)
I currently have economic data in the format YYYY.QX where Q indicates "Quarter" followed by X, which is in [1,4]. This is interpreted as a string.
I've tried to use the date(series, "YMD") and formatting command, as well as the encode function.
Ideally, I'd end up with a numerical variable indicating something like:
YYYY.X
YYYY.M, where "M" is the first month of that quarter
YYYYMM01, where "MM" is the first month of that quarter.
It's best to show exactly what code you tried and what Stata did or said in response.
Such dates are quarterly dates so treating them as anything else is at best indirect and at worst quite wrong.
. set obs 1
obs was 0, now 1
. gen example = "2013.Q4"
. gen qdate = yq(real(substr(example, 1,4)),real(substr(example, -1,1)))
. list
+-----------------+
| example qdate |
|-----------------|
1. | 2013.Q4 215 |
+-----------------+
. format qdate %tq
. list
+------------------+
| example qdate |
|------------------|
1. | 2013.Q4 2013q4 |
+------------------+
Note that your code indicating the date is a daily date can only be wrong. Also that encode (incidentally not a function, but a command) cannot help here unless you specify every string date explicitly as a value label.
UPDATE Note that the function date() is not an all-purpose function for creating any kind of date: it is only for daily dates. There is in fact a synonym daily().
This example shows that using quarterly() is another possibility.
. di quarterly(substr("2013.Q4", 1,4) + " " + substr("2013.Q4", -1,1), "Yq")
215
For a variable series containing such string dates, you could go
. gen qdate = quarterly(substr(series, 1, 4)) + " " + substr(series, -1, 1), "Yq")
. format qdate %tq