When I implement code for one paricular value of state name (see Last_ residence in code)
andhrapradesh.query('Duration_of_residence=="All durations of residence" & Last_residence_R_or_U=="Urban" & Last_residence=="Jammu & Kashmir"',inplace=True)
print(andhrapradesh['Total_migrants'].sum())
It gives desired sum of outflow value for that state from pandas csv.
But when I tried to calculate for all possible names of state it's giving me error "UndefinedVariableError: name 'Jammu & Kashmir' is not defined"
states = ["Jammu & Kashmir","Punjab",'Himachal Pradesh']
for name in states:
andhrapradesh.query(f'Duration_of_residence=="All durations of residence" & Last_residence_R_or_U=="Urban" & Last_residence=={name}',inplace=True)
print(andhrapradesh['Total_migrants'].sum())
can you please figure out why it showing error and how can I do it for all values in list states.
You need quotes around name like you had in your manual code:
df.query(f"... Last_residence == \"{name}\"")
so that it is seen as, e.g., "Punjab" as a string to check against but not bare Punjab which is sought specially, e.g., as a column name, hence the error.
(can use single quotes too...)
However, there's a better way: you can prefix a variable with # symbol, so you don't even need the f-string:
df.query("... Last_residence == #name")
Related
I need to have the 2 strings values merge, but without the source code and the " " if possible. I tried everything but it's only giving me these results, which are the good results, but without the source code.
CODE:
SELECT (fn:concat ((str(?titre)), ". " , (str(?membreDeST))) AS ?nom)
WHERE {?UER_ST foaf:member ?membreDeST .
?membreDeST foaf:title ?titre.
?membreDeST foaf:familyName ?nomDeFamille. FILTER(regex (str(?nomDeFamille), "Paquette", "i"))
}
RESULT:
"M. http://exemple.teluq.ca/ressources/Gilbert_Paquette"
RESULT I should have:
M.Gilbert_Paquette
The query that I did can extract the members who their family name is Paquette. And to be more precise I need to put the title of the person plus their complete name, for those who their family name is Paquette. If anyone one can help, it will be really appreciated.
I have a list pf conditions (" df['sample'].gt(0) & df['sample'].le(92666) , df['sample'].gt(92667) & df['sample'].le (92734)")
choices is also a list ("4444", "5555")
Now when i use df['newsample'] = np.select(conditions, choices, default=0)
I keep getting error "TypeError : invalid entry 0 in condlist: should be boolean ndarray"
But when i type the conditions in manually it works, the issue seems to be related to quotes when i use the list as array for condlist, how can i get around this problem?
Your conditions need to result in a boolean array. Something like
conditions = [ df['sample'] > 0, df['sample']<92666 ...]
What you have passed is a set of strings. These aren't evaluated as part of the call to np.select statement.
(" df['sample'].gt(0) & df['sample'].le(92666) , df['sample'].gt(92667) & df['sample'].le (92734)")
This is a string - see the "". The outer () don't add anything to the expression.
("4444", "5555") # tuple of strings
[4444, 5555] # list of numbers
is a tuple with 2 strings. For this use, a tuple and list are valid
If you typed:
choices = (df['sample'].gt(0) & df['sample'].le(92666) , df['sample'].gt(92667) & df['sample'].le (92734))
and then did a
print(choices)
you should see a tuple with 2 boolean valued dataframes.
I think you need to read some more Python basics, along with your panads reading. Python syntax and objests like lists, strings, tuples are just as important as pandas objects like dataframe and series.
On SO, I was just given two answers that both work when called a single time. Now I want to put them in a loop and loop over several rows of data. However, I'm having a heck of time getting the code correct. I'm suspect it has to how I'm handling the double quotes.
The stand alone code lines are as follows.
Var = ActiveSheet.Evaluate("And(A1:F1)") and
Var = Application.WorksheetFunction.And(Range("A1:F1"))
for the first example I tried:
for i = 2 to 20
Var = ActiveSheet.Evaluate("And(A & i & :F & i)")
Next i
This produces "Error 2015"
for the second:
for i = 2 to 20
Var = Application.WorksheetFunction.And(Range("A" & i & ":F" & i))
Next i
This produces a line of red code
What am I doing wrong?
The Visual Basic Editor is making this harder than it should be, because its default syntax highlighting is making string literals the same color as identifiers:
You can change that under Tools/Options, and make Identifier Text a different color - here teal:
Now string literals are still black, but now identifiers look visually distinctive:
What you want to make sure, is that your variables are syntax-highlighted like identifiers - so they're teal, not black - like in your second example:
Contrast with your first attempt, where i doesn't get syntax-highlighted as the identifier it should be:
And since you know that i is a VBA variable and you want VBA to concatenate its value into this string, then i being syntax-highlighted as any other string literal (and not as an identifier) is your visual cue that something's off!
Compare to #JNevill's fixed version:
With Identifier Text having a different syntax highlighting than string literals in the editor, it becomes much easier to quickly locate a variable that's accidentally inside a string literal.
That first snippet isn't working, because ActiveSheet.Evaluate takes its parameter and gives it to Excel's expression evaluation engine, ...which has no idea what to do with this i. Variable i only exists in the execution context of the VBA code: only VBA code can evaluate its value.
I have two Fields, with partly different strings. FieldA:= "String1" FieldB:= "String1; String2" (So, the main difference between the two fields is the "; String2" in FieldB). The result i want to see is also "String1; String2", but the first half i want from FieldA, and the second half i want from FieldB. Is there any way using Access SQL/VBA function to solve this problem?
With the assumption that your values will always contain a semi-colon, you could also use the Split function in the following way:
[FieldA] & ";" & Split([FieldB],";")(1)
Yes. Use string manipulation functions. This is a relatively simple case for string manipulation, assuming the strings are consistent with the examples given. Consistency is critical to string manipulation. Assuming there is a space following the semi-colon, try:
[FieldA] & "; " & Mid([FieldB], InStr([FieldB], ";") + 2)
Expression can be used in query or textbox or VBA.
Suggest you do some research and learn about these and other string functions.
I'm a new R user, and work requires that I use R on linux. I am running into a very strange problem, and hope some of you expert users can provide a solution. :)
I have a large dataset with >200,000 observations/participants and >300 variables, that involves subsetting from various baseline datasets to create the working dataset.
My issue is that an essential variable changes some times when I run the length command.
"Withdrawlevel" is the variable that changes. This is how this variable should be:
describe(tbl$Withdrawlevel)
tbl$Withdrawlevel
n missing unique Mean
2833 218988 3 1.474
I then run several length commands like the following because I'm interested in getting the number of participants that meet certain criteria.
For example:
length(which(tbl[,'Reg_age_dob']>=18 & as.Date(tbl[,'QuestionnaireEndDate'])>='2013-07-21' & as.Date(tbl[,'QuestionnaireEndDate'])< '2013-07-28' & (is.na(tbl$Withdrawlevel) | (tbl$Withdrawlevel!=3) & (tbl$WithdrawDate<'2013-07-28')) | ((tbl$Withdrawlevel=3) & (tbl$WithdrawDate>='2013-07-28')) ))
And, then Withdrawlevel variable changes:
describe(tbl$Withdrawlevel) tbl$Withdrawlevel
n missing unique Mean
221821 0 1 3
Is the length command described above doing something to this variable, because my understanding is that it shouldn't. And, I have run many similar commands with this data, and this issue doesn't occur after each one.
Any insight into what is going on and how I can resolve this issue?
tbl$Withdrawlevel=3 assigns the value 3 to all observations of tbl$Withdrawlevel. You meant tbl$Withdrawlevel==3.
(Joshua's answer is correct.) In the future you can protect yourself against this sort of error by using with:
with( tbl, length( which(Reg_age_dob >=18 &
as.Date(QuestionnaireEndDate) >='2013-07-21' &
as.Date(QuestionnaireEndDate) < '2013-07-28' &
( is.na(tbl$Withdrawlevel) | (Withdrawlevel!=3) & ( WithdrawDate <'2013-07-28') ) |
( (tbl$Withdrawlevel=3) & ( WithdrawDate >='2013-07-28') ) )
)
)
The point is that this does not have the danger of corrupting your data object and it's also much easier to understand.
You should be using booleans for all your expressions in your which function. Make sure to use == instead of = which returns a value of True or False rather than setting the variable to equal the value.