i am trying to "loop" this dataframe and add a column based on the condition of another columns. i think i achieve my result with this logic but cant seem to find the right iterator or other method to create my new column. In this example I am using the iteritems but this code just spins for minutes and no result is given,i manually cancel the code. my dataframe has 400,000 columns. a screenshot of df is included.
the goal is to fill the instances where ['close'] == ['prev'] and replace the 0 value with most recent trade signal (either +, or -).
for index, col in df.T.iteritems():
if col['Close'] > col['prev']:
col['trade2'] = '+'
x = '+'
continue
elif col['Close'] < col['prev']:
col['trade2'] = '-'
x = '-'
continue
elif col['Close'] == col['prev']:
col['trade2'] = x
I created the test DataFrame as follows:
df = pd.DataFrame(data=[
[ 36.50, 36.53, '-' ],
[ 36.53, 36.50, '+' ],
[ 36.53, 36.53, '0' ],
[ 36.53, 36.53, '0' ],
[ 36.53, 36.53, '0' ],
[ 36.51, 36.53, '-' ],
[ 36.51, 36.51, '0' ],
[ 36.53, 36.51, '+' ],
[ 36.53, 36.53, '0' ],
[ 36.53, 36.53, '0' ],
[ 36.53, 36.53, '0' ]],
columns=['Close', 'prev', 'trade'],
index=range(5,16))
To compute trade2 value for the current row, we have
to define the following function:
def cmp(row):
global lastResult
if row.Close > row.prev:
lastResult = '+'
elif row.Close < row.prev:
lastResult = '-'
return lastResult
It uses a global variable lastResult, which will be initially set
to '0'.
And instead of your loop with iteritems, you should just apply this
function to each row and substitute the result to the new column,
starting from initial setting of lastResult (as mentioned above):
lastResult = '0'
df['trade2'] = df.apply(cmp, axis=1)
This is just the way how operations of such type should be performed.
And one remark concerning your code.
If you use iteritems, then:
within the loop you should add results to some list,
after the loop, this list should be substituted to a new column.
Another, quicker solution
If you already have trade column, you can execute:
df['trade2'] = df.trade.replace(to_replace='0')
This method relies on the fact that trade column is always a string
(either a minus, a plus or '0'), so to_replace argument will match (string) zero cases.
The next argument - value - has not been given, so it defaults to
None.
In such circumstances, the pad replace method is assumed, meaning
fill values forward.
This way, either '+' or '-' is "replicated" down the column, giving just
the same result.
For source DataFrame with 1100 rows (your DataFrame replicated 100 times)
I got the execution time over 100 times shorter than with the first
method.
Related
I have the following list of dictionaries, with sub dictionaries data:
data2 = [
{"dep": None},
{"dep": {
"eid": "b3ca7ddc-0d0b-4932-816b-e74040a770ec",
"nid": "fae15b05-e869-4403-ae80-6e8892a9dbde",
}
},
{"dep": None},
{"dep": {
"eid": "c3bcaef7-e3b0-40b6-8ad6-cbdb35cd18ed",
"nid": "6a79c93f-286c-4133-b620-66d35389480f",
}
},
]
And I have a match key:
match_key = "b3ca7ddc-0d0b-4932-816b-e74040a770ec"
And I want to see if any sub dictionaries of each "dep" key in data2 have an eid that matches my match_key. I'm trying the following, but I get a TypeError: string indices must be integers - where am I going wrong?
My Code
matches = [
d["eid"]
for item in data2
if item["dep"]
for d in item["dep"]
if d["eid"] == match_key
]
So matches should return:
["b3ca7ddc-0d0b-4932-816b-e74040a770ec"]
Meaning it found this id in data2.
When you iterate over a dictionary, each iteration gives you a key from the dictionary.
So d["eid"] is actually "eid"["eid"], which is an invalid expression. That's why Python raises the following exception:
TypeError: string indices must be integers
Also, the expression d["eid"] assumes that every d contains the eid key. If it doesn't, Python will raise a KeyError.
If you don't know for sure that "eid" is a valid key in the dictionary, prefer using the .get method instead.
matches = [
v
for item in data2
if item.get("dep") # Is there a key called dep, and it has a non-falsy value in it
for k, v in item["dep"].items() # Iterate over the dictionary items
if k == "eid" and v == match_key
]
You can do even better by directly accessing the value of eid key:
matches = [
d["dep"]["eid"]
for d in data2
if d.get("dep") and d["dep"].get("eid") == match_key
]
I have the following code that suffers from a race condition. Sometimes I can see the results, sometimes I cannot:
const op1 = ()=>{
const filesObs = from(['a','b','c']).pipe(delay(200))
return (obs)=>{
return obs
.pipe(delay(100))
.pipe(withLatestFrom(filesObs))
}
}
from([1,2,3,4,5]).pipe(op1()).subscribe(console.log);
As it is I don't see anything printed. But If I increase the 2nd delay to 300 I see the expected values:
[ 1, 'c' ]
[ 2, 'c' ]
[ 3, 'c' ]
[ 4, 'c' ]
[ 5, 'c' ]
Is there a way to always see the result by using observeOn or subscribeOn somewhere on my code or should I follow another best practice?
First of all this is not an issue specific to withLatestFrom being inside an operator.
The code below that is not used inside an operator also does not print anything (faces the same issue):
const filesObs = from(['a','b','c']).pipe(delay(200));
from([1,2,3,4,5]).pipe(delay(100),withLatestFrom(filesObs)).subscribe(console.log);
According to the provided desired output in the question we need to get the last element of the letters stream and pair it with each value from the numbers stream. But what withLatestFrom() will get is the last emitted element at each point in time. To justify this, consider adding some delay between the emitted elements of the letters stream (1st line).
//adding a delay on each of the emitted letters.
const filesObs = from(['a','b','c']).pipe(concatMap((v)=>of(v).pipe(delay(50))))
from([1,2,3,4,5]).pipe(delay(100),withLatestFrom(filesObs)).subscribe(console.log);
[ 1, 'a' ]
[ 2, 'a' ]
[ 3, 'a' ]
[ 4, 'a' ]
[ 5, 'a' ]
As you can see the above is not the desired output.
Also, I am not sure if it is an rxjs bug, but withLatestFrom() will skip the value if there is nothing emitted yet on the observable arguement. See below how it skips the first number (because at the moment it is emits it, nothing has been emitted yet on filesObs).
const filesObs = from(['a','b','c']).pipe(concatMap((v)=>of(v).pipe(delay(50))))
//Now adding a delay on each of the emitted numbers.
from([1,2,3,4,5]).pipe(concatMap((v)=>of(v).pipe(delay(25)))).pipe(withLatestFrom(filesObs)).subscribe(console.log);
[ 2, 'a' ]
[ 3, 'a' ]
[ 4, 'b' ]
[ 5, 'b' ]
solution
One solution to the problem is to get the last() element of the letters stream and repeat() it. Then map each number with the first() element of filesObs, which will now always be the last one ('c'):
const filesObs = from(['a','b','c']).pipe(delay(200), last(), repeat())
from([1,2,3,4,5]).pipe(delay(100)).pipe(mergeMap(v=>filesObs.pipe(first(),map(v2=>[v2,v])))).subscribe(console.log);
And the same inside an operator:
const op1 = ()=>{
const filesObs = from(['a','b','c']).pipe(delay(200), last(), repeat())
return (obs)=>{
return obs
.pipe(delay(100))
.pipe(mergeMap(v=>filesObs.pipe(first(),map(v2=>[v2,v]))))
}
}
from([1,2,3,4,5]).pipe(op1()).subscribe(console.log);
Both of the above will output the below value, independent to the delay values:
[ 'c', 1 ]
[ 'c', 2 ]
[ 'c', 3 ]
[ 'c', 4 ]
[ 'c', 5 ]
I have a map of elements:
elemA1: value
elemB1: value
elemC1: value
...
elemA99: value
elemB99: value
elemC99: value
...
elemA7823: value
elemB7823: value
elemD7823: value
I want to use groupBy to group each set of elements by number.
The number will always be at the end of the key, but my problem is that the number can be any number of characters.
Just have the groupBy closure extract the part of the key you want to group by. Here I'm using the regular expression /\d+$/ to get digits at the end of the key.
def map = [
elemA1: "1",
elemB1: "B1",
elemA99: "A99",
elemB99: "B99"
]
map.groupBy { ( it.key =~ /\d+$/ )[0] } // [1:[elemA1:1, elemB1:B1], 99:[elemA99:A99, elemB99:B99]]
I want to fetch common elements from multiple arrays. The no. of arrays resulted would keep changing depending upon the no. of tags in array a[].
As a first step, my query and result I get is as shown below:
let a=["Men","Women","Accessories"]
let c=(for i in a
Let d=Concat("Tags/",i)
return d)
for i in c
let m=(for y in outbound i TC
return y._key)
return m
and result I get is:
[
[
"C1",
"C5",
"C7",
"C3"
],
[
"C2",
"C5",
"C6",
"C4"
],
[
"C7",
"C5",
"C6"
]
]
From this result, I want only common element as a result i.e "C5" (here).
How can I get that?
This question has also been asked and answered on github.
The function INTERSECTION() returns the intersection of all specified arrays and APPLY() is used to pass a dynamic amount of nested arrays.
The query
let D = [["C1","C5","C7","C3"],["C2","C5","C6","C4"],["C7","C5","C6"]]
RETURN APPLY("INTERSECTION", D)
results in:
[
[
"C5"
]
]
Given a graph via "contains" I have the following:
D contains LibD
C contains LibC and D
B contains LibB and D
A contains LibA, B, and C
Using:
FOR v,e,p IN 1..50 INBOUND 'pmconfig/899018092734' pm_content RETURN p.vertices
I get the following paths:
A->B->LibB
A->B->D->LibD
A->B->D
A->B
A->LibA
A->C->LibC
A->C->D->LibD
A->C->D
A->C
I'd like to filter out intermediate points so I get:
A->B->LibB
A->B->D->LibD
A->LibA
A->C->LibC
A->C->D->LibD
If the LibX elements were leafs, I could add a filter like
FILTER LENGTH(EDGES(pm_content,v._id,'inbound'))==0
But suppose I had one path: A->B->C->D->B
In this case I would filter everything out. What I would like to have is
A->B->C->D since the walk should stop when it recognizes a cycle.
How can I construct a filter that removes intermediate points? Specifically, only those that end with a leaf node or all content links point to vertex that were already traversed.
To filter the "unfinished paths" we need to predict whether the traverser would be able to continue its journey further down along the graph.
The only way to find out is to try - so we add a second traversal originating from the current vertex (v) in a sub query, which goes one step at most.
The sub-query will return two possible results: [1] if there are more nodes, [] if not. We can test for this using the LENGTH() function.
We will then use this information to filter the unfinished paths from the result:
FOR v,e,p IN 1..50 INBOUND 'pmconfig/899018092734' pm_content
LET next = (FOR x IN 1 INBOUND v pm_content LIMIT 1 RETURN 1)
FILTER LENGTH(next) == 0
RETURN p.vertices
Lets try to test this on the Traversal Graph; We change the direction to OUTBOUND since we get more results with that easily. We Restrict the output to only give us the _key, so we can revalidate the result without problems.
var examples = require("org/arangodb/graph-examples/example-graph.js");
var graph = examples.loadGraph("traversalGraph");
db._query(`
FOR v,e,p IN 1..50 OUTBOUND 'circles/A' GRAPH 'traversalGraph'
LET next = (FOR x IN 1 OUTBOUND v GRAPH 'traversalGraph' LIMIT 1 RETURN 1)
FILTER LENGTH(next) == 0
RETURN p.vertices[*]._key
`).toArray()
[
[ "A", "B", "C", "D" ],
[ "A", "B", "E", "F" ],
[ "A", "G", "H", "I" ],
[ "A", "G", "J", "K" ]
]
As we can see, we only get the paths to the endpoints - as expected.