Dataset Sample
I have data set like the attached picture where I want only the observations that have same numsecur every year.
How do I do this in SAS proc sql function? Will this be easier to do in STATA? If so what procedure can I use?
You look like a new user to stackoverflow. Welcome. Your question is getting down voted for at least three reasons:
1) It's not really clear what you want from your description of the problem and the data
you're providing
2) You haven't shown any attempts at what you've tried
3) Providing your data as a picture is not great. It's most helpful if you're going
to provide data to provide it so it's easy for others to consume in their program.
After all, you're asking for our help make it easier for us to help you. If You
included something like the following we just have to copy and paste to create your
dataset to work with:
DATA test;
INPUT ID YEAR EXEC SUM;
DATALINES;
1573 1997 50 1080
1581 1997 51 300
1598 1996 54 80
1598 1998 54 80
1598 1999 54 80
1602 1996 55 112.6
1602 1997 55 335.965
;
RUN;
That being said the following MAY give you what you're looking for but it's only a guess as I'm not sure if this is really what you're asking:
proc sql no print;
create table testout as
select *,count(*) as cnt
from test
group by sum
having cnt > 1;
quit;
Are you asking: show all rows where the same SUM is used or something else?
Assuming I understand your question correctly, you would like to keep the observations from the same company/individual only if the company has the same numsecur every year. So, here is what I would try using STATA:
input ID YEAR EXEC SUM
1573 1997 50 1080 //
1581 1997 51 300 //
1598 1996 54 80 //
1598 1998 54 80 //
1598 1999 54 80 //
1602 1996 55 112.6 //
1602 1997 55 335.965 //
1575 1997 50 1080 //
1575 1998 51 1080 //
1595 1996 54 80 //
1595 1998 54 30 //
1595 1999 54 80 //
1605 1996 55 112.6 //
1605 1997 55 335.965 //
end
bysort ID SUM: gen drop=cond(_N==1, 0,_n)
drop if drop==0
The results show ( based on my data):
ID YEAR EXEC SUM drop
1. 1575 1997 50 1080 1
2. 1575 1998 51 1080 2
3. 1595 1999 54 80 1
4. 1595 1996 54 80 2
5. 1598 1996 54 80 1
6. 1598 1998 54 80 2
7. 1598 1999 54 80 3
Related
I'm trying to query my customlogs table (Eg: CustomData_CL) by giving the time range. The result of this query will be the filtered time ranged data. I want to find out the data size of the resulted output.
Query which I have used to fetch the time ranged o/p:
CustomData_CL
| where TimeGenerated between (datetime(2022–09–14 04:00:00) .. datetime(2020–09–14 05:00:00))
But it is giving the following error:
Can anyone please suggest on the same ?
Note the characters with code point 8211.
These are not standard hyphens (-) 🙂.
let p_str = "(datetime(2022–09–14 04:00:00) .. datetime(2020–09–14 05:00:00))";
print str = p_str
| mv-expand str = extract_all("(.)", str) to typeof(string)
| extend dec = to_utf8(str)[0]
str
dec
(
40
d
100
a
97
t
116
e
101
t
116
i
105
m
109
e
101
(
40
2
50
0
48
2
50
2
50
–
8211
0
48
9
57
–
8211
1
49
4
52
32
0
48
4
52
:
58
0
48
0
48
:
58
0
48
0
48
)
41
32
.
46
.
46
32
d
100
a
97
t
116
e
101
t
116
i
105
m
109
e
101
(
40
2
50
0
48
2
50
0
48
–
8211
0
48
9
57
–
8211
1
49
4
52
32
0
48
5
53
:
58
0
48
0
48
:
58
0
48
0
48
)
41
)
41
Fiddle
Update, per OP request:
Please note that in addition to the use of a wrong character that caused the syntax error, your 2nd datetime year was wrong.
// Generation of mock table. Not part of the solution
let CustomData_CL = datatable(TimeGenerated:datetime)[datetime(2022-09-14 04:30:00)];
// Solution starts here
CustomData_CL
| where TimeGenerated between (datetime(2022-09-14 04:00:00) .. datetime(2022-09-14 05:00:00))
TimeGenerated
2022-09-14T04:30:00Z
Fiddle
I'm preparing the material for a KQL course, and I thought about creating a challenge, based on your question.
Check out what happened when I posted your code into Kusto Web Explorer... 🙂
How cool is that?!
I'm writing a bingo game in python. So far I can generate a bingo card and print it.
My problem is after I've randomly generated a number to call out, I don't know how to 'cross out' that number on the card to note that it's been called out.
This is the ouput, it's a randomly generated card:
B 11 13 14 2 1
I 23 28 26 27 22
N 42 45 40 33 44
G 57 48 59 56 55
O 66 62 75 63 67
I was thinking to use random.pop to generate a number to call out (in bingo the numbers go from 1 to 75)
random_draw_list = random.sample(range(1, 76), 75)
number_drawn = random_draw_list.pop()
How can I write a funtion that will 'cross out' a number on the card after its been called.
So for example if number_drawn results in 11, it should replace 11 on the card with an x or a zero.
this is my data and im trying to get the formula of it, how can i do it ( it dosent have to be using excell only but i dont know how to do it )
0 2 4 6 8
0 100 90 80 70 60
2 85 64.49 53.5 48.15 50
4 70 48.9 38.43 35.03 40
6 55 38.78 30.39 27.07 30
8 40 35 30 25 20
and this is the graphic that i obtain
but when i try to do an adjustment of the data i cant find the option as in a 2d graph
ok i didnt find how to get the equation for those values but this problem its solved by the bilinear interpolation
i used this video https://www.youtube.com/watch?v=va8vFViss90
and this calculator to make sure that i didnt messed it up https://www.ajdesigner.com/phpinterpolation/bilinear_interpolation_equation.php#ajscroll
I have a text file consisting of data that is separated by tab-delimited columns. There are many ways to read data in from the file into python, but I am specifically trying to use a method similar to one outlined below. When using a context manager like with open(...) as ..., I've seen that the general concept is to have all of the subsequent code indented within the with statement. Yet when defining a function, the return statement is usually placed at the same indentation as the first line of code within the function (excluding cases with awkward if-else loops). In this case, both approaches work. Is one method considered correct or generally preferred over the other?
def read_in(fpath, contents=[], row_limit=np.inf):
"""
fpath is filelocation + filename + '.txt'
contents is the initial data that the file data will be appeneded to
row_limit is the maximum number of rows to be read (in case one would like to not read in every row).
"""
nrows = 0
with open(fpath, 'r') as f:
for row in f:
if nrows < row_limit:
contents.append(row.split())
nrows += 1
else:
break
# return contents
return contents
Below is a snippet of the text-file I am using for this example.
1996 02 08 05 17 49 263 70 184 247 126 0 -6.0 1.6e+14 2.7e+28 249
1996 02 12 05 47 26 91 53 160 100 211 236 2.0 1.3e+15 1.6e+29 92
1996 02 17 02 06 31 279 73 317 257 378 532 9.9 3.3e+14 1.6e+29 274
1996 02 17 05 18 59 86 36 171 64 279 819 27.9 NaN NaN 88
1996 02 19 05 15 48 98 30 266 129 403 946 36.7 NaN NaN 94
1996 03 02 04 11 53 88 36 108 95 120 177 1.0 1.5e+14 8.7e+27 86
1996 03 03 04 12 30 99 26 186 141 232 215 2.3 1.6e+14 2.8e+28 99
And below is a sample call.
fpath = "/Users/.../sample_data.txt"
data_in = read_in(fpath)
for i in range(len(data_in)):
print(data_in[i])
(I realize that it's better to use chunks of pre-defined sizes to read in data, but the number of characters per row of data varies. So I'm instead trying to give user control over the number of rows read in; one could read in a subset of the rows at a time and append them into contents, continually passing them into read_in - possibly in a loop - if the file size is large enough. That said, I'd love to know if I'm wrong about this approach as well, though this isn't my main question.)
If your function needs to do some other things after writing to the file, you usually do it outside the with block. So essentially you need to return outside the with block too.
However if the purpose of your function is just to read in a file, you can return within the with block, or outside it. I believe none of the methods are preferred in this case.
I don't really understand your second question.
You can put return also withing with context.
By exiting context, the cleanup are done. This is the power of with, not to need to check all possible exit paths. Note: also with exception inside with the exit context is called.
But if file is empty (as an example), you should still return something. So in such case your code is clear, and follow the principle: one exit path. But if you should handle end of file without finding something important, I would putting normal return within with context, and handle the special case after it.
When extracting and moving data, the first column of the criteria is working but the second criteria is not engaging. It is returning the movement for all stores if they had sold that item.
List of column headers.
R2=Left Len,
S2=Store
A2=Left Len,
B2=UPC,
C2=Store,
D2=Movement,
The file is just short of 900k rows of data in total.
I believe it to be an issue with Current Region.
Also need for this to return zero if there is no movement for that store. This will be repeated 39 more times to the right in order to get results for each location.
Ultimate goal is to find the Zero Movers that need to be addressed. So the rows of upc's would need to stay aligned with the criteria.
Any help would be greatly appreciated.
Using Windows 7,
Office 2016
Sub Find_Fill_Data()
Range("u2:x" & Range("x" & Rows.Count).End(xlUp).Row).ClearContents
Range("a2:D" & Range("D" & Rows.Count).End(xlUp).Row).AdvancedFilter Action:=xlFilterCopy, criteriarange:=Range("r2").CurrentRegion, copytorange:=Range("u2"), unique:=False
Range("q4").Select
End Sub
**Left Len Item 5 7 8 9**
1070002152 MILK DUDS THEATER BOX 123 254 181 196
1070002385 WHOPPERS MALT BALLS 19 0 28 42
1070002440 WHOPPERS MALT BALLS 92 188 79 133
1070002660 WHOPPERS MALT BALLS 22 21 11 22
1070006080 CANDY BAR 575 463 446 303
1070006611 WHOPPER ROBIN EGGS 22 28 25 0
1070008807 CANDY 132 57 59 0
1070008813 THEATER BOX 331 127 101 171
1070013272 J/RANCHER CRNCH CHEW ASST 61 0 0 0
1070050180 WHOPPERS MALT BALLS CARTN 119 24 99 99