I have written code to plot a graph which is supposed to show one name for each collection.
There are 10 ~ 15 collections so having names for every item in the chart gets messy.
My Excel looks like this in intake port parameter sheet and the graph is plotted between v1 and v2. a is the variable name and case is the case of the variable.
I want all the points to group near one place and have one "a" or "b" as a caption for entire group depending on the group
name case V1 V2
a 1 10 11
a 2 12 11
a 3 12 12
a 4 12 11
a 5 12 12
b 1 15 12
b 2 10 10
b 3 15 11
b 4 12 15
b 5 19 12
C 1 01 02
C 2 02 01
C 3 02 11
....
For j = 2 To i
a = 4 + (j - 1) * 6
ActiveChart.SeriesCollection.NewSeries
ActiveChart.SeriesCollection(j).Name = "=INTAKE_PORT_PARAMETER!C" & a
ActiveChart.SeriesCollection(j).XValues = "=INTAKE_PORT_PARAMETER!Z" & a & ":Z" & a + 5
ActiveChart.SeriesCollection(j).Values = "=INTAKE_PORT_PARAMETER!AA" & a & ":AA" & a + 5
ActiveChart.ChartArea.Select
ActiveChart.SeriesCollection(j).ApplyDataLabels
ActiveChart.SeriesCollection(j).DataLabels.Item(1).Select
Selection.ShowSeriesName = True
Selection.ShowValue = False
Selection.Format.TextFrame2.TextRange.Font.Size = 8
ActiveChart.SeriesCollection(j).DataLabels.Select
Selection.ShowValue = False
Next j
When run manually (by clicking F8) the code is working.
The output dots in the graph looks as I want.
| ..(a)
| .. **(b)
|
| ## (C)
|____________________
When run using F5 instead of step by step, I get one value for each collection and the name of the collection is not visible.
| ..(10)
| .. **(15)
|
| ## (01)
|____________________
I couldn't post the pictures of the graphs. My company treats those as confidential data.
Your question is unclear in several ways.
Are V1 and V2 your X and Y values? If so, it's easy with a few formulas. The formula in cell E2, filled into E2:G14, is
=IF(E$1=$A2,$D2,NA())
Select C1:C14, then hold Ctrl while selecting E1:G14, and insert your scatter chart.
Related
x=[]
y1=[]
r1=len(df)
L1=len(df.columns)
for i in range(r1):
ll=(df.loc[i,'LL'])
ul=(df.loc[i,'UL'])
count1 =0
for j in range(5,L1):
if isinstance(df.iloc[i,j],str):
df.loc[i,j]=0
if ll<=df.iloc[i,j]<=ul:
count1=count1+1
if count1==(L1-5):
x.append('Pass')
else:
x.append('Fail')
y1.append(count1)
se = pd.Series(x)
se1=pd.Series(y1)
df['Min']=min1.values
df['Mean']=mean1.values
df['Median']=median1.values
df['Max']=max1.values
df['Pass Count']=se1.values
df['Result']=se.values
min1 = df.iloc[:,5:].min(axis=1)
mean1=df.iloc[:,5:].astype(float).mean(axis=1,skipna = True)
median1=df.iloc[:,5:].astype(float).median(axis=1,skipna = True)
max1=df.iloc[:,5:].max(axis=1)
count1=df.iloc[:,5:].count(axis=1)
yield1=[]
for i in range(len(se1)):
yd1=(se1[i]/(L1-3))*100
yield1.append(yd1)
se2=pd.Series(yield1)
df['Yield']=se2.values
df1=df.loc[:,['PARAMETER','Min','Mean','Median','Max','Result','Pass Count','Yield']]
df1
Below is my data set, it is sensor data on daily basis. Daily data should be within the Lower Limit (LL) and Upper Limit(UL). I want to count how many days sensors data is within the LL and UL.
I am not able to calculate the number of days for sensor data within LL and UL using Pandas. How can I calculate the number of days for sensor data within LL and UL?
Take a few key ideas
need a list of the columns that go into calc daycols
transpose these columns into an array then to test, gives a boolean array
sum this boolean array and you have your desired calc
df = pd.read_csv(io.StringIO("""sensor location,LL,UL,day1,day2,day3,day4,day5,day6,day7,number of days sensor data within LL and UL
A,1,10,12,6,9,4,9,7,15,5
B,1,12,4,15,7,1,11,1,7,6
C,1,15,13,13,13,10,7,13,13,7
D,1,10,12,1,14,12,15,4,4,3
E,1,20,11,15,8,14,1,14,14,7"""))
daycols = [d for i,d in enumerate(df.columns) if "day" in d and "number" not in d]
df = df.assign(
# use fact true is 1 so sum a truth array gives the answer
daysBetween=lambda dfa: ((dfa.loc[:,daycols].T>=dfa["LL"]) &
(dfa.loc[:,daycols].T<=dfa["UL"])).sum()
)
print(df.to_string(index=False))
output
sensor location LL UL day1 day2 day3 day4 day5 day6 day7 number of days sensor data within LL and UL daysBetween
A 1 10 12 6 9 4 9 7 15 5 5
B 1 12 4 15 7 1 11 1 7 6 6
C 1 15 13 13 13 10 7 13 13 7 7
D 1 10 12 1 14 12 15 4 4 3 3
E 1 20 11 15 8 14 1 14 14 7 7
speed up
It you have many columns then you can use slice capability to identify them and turn into indexes so iloc can be used. Additionally the transpose is not necessary.
dayi = [df.columns.get_loc(c) for c in df.columns[3:-1]]
df = df.assign(
# use fact true is 1 so sum a truth array gives the answer
daysBetween=lambda dfa: ((dfa.iloc[:,dayi]>=dfa["LL"]) &
(dfa.iloc[:,dayi]<=dfa["UL"])).sum()
)
Trying to get a simple VLOOKUP to work but only getting first value.
Sample Data
TABA
V
1
2 1
3 X
X = =SUM(VLOOKUP(V2,TABB!$K:$M,3,FALSE))
TABB
K L M
1 1 Hello 45
2 8 Hello 30
3 1 Hello 20
4 6 Hello 60
5 1 Hello 90
6 3 Hello 10
7 1 Hello 80
8 1 Hello 75
Current Output
=SUM(VLOOKUP(V2,TABB!$K:$M,3,FALSE))
Is returning 45 (the first value).
Expected Output
=SUM(VLOOKUP(V2,TABB!$K:$M,3,FALSE))
I want it to return 310 (the SUM of values that match 1).
Vlookup can only return the first value.
In your TABB Sheet. Create another column in Column N
e.g in N1 with
=SUMIF($K$1:$K$8,$K1,$M$1:$M$8)
Drag the formula down
Then do
Vlookup(V2,TABB!$K:$N,4,FALSE)
If this works can you please vote up. :)
All the best
My dataset is in the following form:
clear
input id var
1 20
1 21
1 32
1 34
2 11
2 .
2 15
3 21
3 22
3 1
3 2
3 5
end
In my true dataset, observations are sorted by id and by year (not shown here).
What I need to do is to drop all the rows of a specific id if (at least) one of the following two conditions is met:
there is at least one missing value of var.
var decreases from one row to the next (for the same id)
So in my example what I would like to obtain is:
id var
1 20
1 21
1 32
1 34
Now, my unfortuante attempt has been to use row-wise operations together with by, in order to create a drop1 variable to be later used to subset the dataset.
Something on these lines (which is clearly wrong), :
bysort id: gen drop1=1 if var[_n] < var[_n-1] | var[_n]==.
This doesn't work, and I am not even sure that I am considering the most "clean" and direct way to solve the task.
How would you proceed? Any help would be highly appreciated.
My interpretation is that you want to drop the complete group if either of two conditions are met. I assume your dataset is sorted in some way, most likely, based on another variable. Otherwise, the structure is fragile.
The logic is simple. Check for decreasing values but leave out the first observation of each group, i.e., leave out _n == 1. The first observation, if non-missing, will always be smaller. Then, check also for missings.
clear
set more off
input id var
1 20
1 21
1 32
1 34
2 11
2 .
2 15
3 21
3 22
3 1
3 2
3 5
end
// maintain original sequencing
gen orig = _n
order id orig
bysort id (orig) : gen todrop = sum((var < var[_n-1] & _n > 1) | missing(var))
list, sepby(id)
by id : drop if todrop[_N]
list, sepby(id)
One way to do this is to create some indicator variable as you had attempted. If you only want to drop where var decreases from one observation to the next, you could use:
clear
input id var
1 20
1 21
1 32
1 34
2 11
2 .
2 15
3 21
3 22
3 1
3 2
3 5
4 .
4 2
end
gen i = id if mi(var)
bysort id : egen k = mean(i)
drop if id == k
drop i k
drop if var[_n-1] > var[_n] & _n != 1
However, if you want to get the output you supplied in the post (drop all subsequent observations where var decreases from some max value), you could try the following in place of the last line above.
local N = _N
forvalues i = 1/`N' {
drop if var[_n-1] > var[_n] & _n != 1
}
The loop just ensures that the drop if var... section of code is executed enough so that all observations where var < 34 are dropped.
Is there another method of doing this calculation that will allow me to drag across cells and have the column index range adjust accordingly?
{=SUM(VLOOKUP(value,array,{1,2,3,4},0))}
The column index numbers unfortunately don't increase (ie. {1,2,3,4} --> {2,3,4,5} etc..) using this formula. Any suggestions on how to go about this without having to input manually?
Any help would be much appreciated!
I suggest you to use COLUMN() function that returns number of the current column that you called it.
In column B : Column() = 2
And when you are using this function at your current format:
In column K : Column() = 11
Column()- 11 + 1 = 1
Column()- 11 + 2 = 2
Column()- 11 + 3 = 3
Column()- 11 + 4 = 4
Now with copying it to column L : Column() = 12
Column()- 11 + 1 = 2
Column()- 11 + 2 = 3
Column()- 11 + 3 = 4
Column()- 11 + 4 = 5
I have the following code:
NI1=[NI{:,1} NI{:,2} NI{:,3}];
[~,NI2]=sort(NI1(:,2));
NI1=NI1(NI2,:);
NI1((NI1(:,3) == 0),:) = [];
NI1=unique(NI1(:,1:3),'rows');
NI3= unique(NI1(:,1:2),'rows')
for mj=1:size(NI3,1)
NI3(mj,3)=sum(NI1(:,1) == NI3(mj,1) & NI1(:,2)==NI3(mj,2));
end
My initial cell-array NI1 has in collumns: 1) the year; 2) a code that corresponds to a bank 3) a code that corresponds to the workers of the bank. EXAMPLE:
c1 c2 c3
1997 3 850
1997 3 1024
1997 3 5792
My output NI3 counts how many analysts (c3), for the different years (c1) are working in each bank (c2), for instance:
c1 c2 c3
1997 3 14
1997 7 84
1997 11 15
1998 4 1
1998 15 10
1998 3 12
1999 11 17
Now I am trying to apply exactly the same code, but my last column (c3) is a string so initial cell array fir_ins is the following:
1997 3 'ACAD'
1997 3 'ADCT'
1997 3 'ADEX'
I want to obtain exactly the same output as in NI3, but I have to change the code, since my last column is a string.
I am only missing the last part, this is the code I have so far.
ESTIMA=num2cell(I{:,6});
ANALY=num2cell(I{:,7});
YEAR = num2cell(T_ANNDAT3);
fir_ins=[YEAR ESTIMA I{:,1}];
fir_ins= sortrows(fir_ins,2);
[~, in2,~] = unique(strcat(fir_ins(:,2),fir_ins(:, 3)));
fir_ins = fir_ins(in2,:);
fir_ins= sortrows(fir_ins,[1 2]);
fir_ins2=fir_ins(:,1:2);
fir_ins2=unique(cell2mat(fir_ins2(:,1:2)),'rows');
This part is not working:
for jm=1:size(fir_ins2,1)
fir_ins2(jm,3)=sum(cell2mat(fir_ins(:,1))) == fir_ins2(jm,1) & cell2mat(fir_ins(:,2))==cell2mat(fir_ins2(jm,2));
end
You can perform this "aggregation" more efficiently with the help of accumarray function. The idea is to map the first two columns (row primary keys) into subscripts (indices starting from 1), then pass those subscripts to accumarray to do the counting.
Below is an example to illustrate. First I start by generating some random data resembling yours:
% here are the columns
n = 150;
c1 = sort(randi([1997 1999], [n 1])); % years
c2 = sort(randi([3 11], [n 1])); % bank code
c3 = randi(5000, [n 1]); % employee ID as a number
c4 = cellstr(char(randi(['A' 'Z']-0, [n,4]))); % employee ID as a string
% combine records (NI)
X = [c1 c2 c3]; % the one with numeric worker ID
X2 = [num2cell([c1 c2]) c4]; % {c1 c3 c4} % the one with string worker ID
Note that for our purposes, it doesn't matter if the workers ID column is expressed as numbers or string; we won't be using them, only the first two columns that represent the "primary keys" of the rows are used:
% find the unique primary keys and their subscript mapping
[years_banks,~,ind] = unique([c1 c2], 'rows');
% count occurences (as in SQL: SELECT COUNT(..) FROM .. GROUPT BY ..)
counts = accumarray(ind, 1);
% build final matrix: years, bank codes, counts
M = [years_banks counts];
I got the following result with my fake data:
>> M
M =
1997 3 13
1997 4 11
1997 5 15
1997 6 14
1997 7 4
1998 7 11
1998 8 24
1998 9 15
1999 9 1
1999 10 22
1999 11 20