car={"po1":50,"po2":"-","po3":15,"po4":"+","po5":12}
vocar = list(car.keys())
inter=0
def cal(car,vocar,inter):
while len(car)!=1:
for inter in range(len(car)):
if car.get(vocar[inter],0)=="+":
new=car.get(vocar[inter-1])+car.get(vocar[inter+1])
car.pop(vocar[inter])
car.pop(vocar[inter+1])
car.update({vocar[inter-1]:new})
car1=car
vocar1=list(car1.keys())
inter1=0
cal(car1,vocar1,inter1)
elif car.get(vocar[inter],0)=="-":
new=car.get(vocar[inter-1])-car.get(vocar[inter+1])
car.pop(vocar[inter])
car.pop(vocar[inter+1])
car.update({vocar[inter-1]:new})
car1=car
vocar1=list(car1.keys())
inter1=0
cal(car1,vocar1,inter1)
print(car)
cal(car,vocar,inter)
I keep getting a key error even if I get what I wanted, which is {'po1': 47}.
But after everything is done, it gives me a key error. Please help!
At first:
while len(car)!=1
You pop item from car and try to make a recursion function then you can use:
if not car:
return
for inter in range(len(car)):
# ....
When you make a loop like this:
for inter in range(len(car))
that means:
| loop | inter |
|:----:|:-----:|
| 1 | 0 | car => [X, X, X ,X ,X]
| 2 | 1 | car => [X, X, X ,X]
| 3 | 2 | car => [X, X, X] ! ERR !
| 4 | 3 |
| 5 | 4 |
in loop 3 you have an error (maybe ;))
you can use the main dict:
for inter in car:
According above, You didn't need vocar anymore:
if car.get(inter,0)=="+":
"po1" is str and "po2" is int then you can't use + operator between int and int.
dictionaries are unordered
That means, every time you run it, item arrangments may be changed! so I change it:
car = [("po1",50), ("po2","-"),("po3",15),("po4","+"),("po5",12)]
We changed car then we can do this:
if inter[1] == "+":
car[head-1] = (inter[0], car[head-1][1]+car[head+1][1])
finally, we MUST remove car[head+1] at first and then remove car[inter].
car = [("po1",50), ("po2","-"),("po3",15),("po4","+"),("po5",12)]
def cal(car):
head = 0
if not car:
return
for inter in car:
if inter[1] == "+":
car[head-1] = (inter[0], car[head-1][1]+car[head+1][1])
car.remove(car[head+1])
car.remove(inter)
cal(car)
elif inter[1] =="-":
car[head-1] = (inter[0], car[head-1][1]-car[head+1][1])
car.remove(car[head+1])
car.remove(inter)
cal(car)
head += 1
return car[0][1]
print(cal(car))
OUT:
===
47
Related
im having 2 list of different variable, so i want to compare and update the 'Check' value from list 2 if the 'Brand' from list 2 is found in list 1
-------------------- --------------------
| Name | Brand | | Brand | Check |
-------------------- --------------------
| vga x | Asus | | MSI | X |
| vga b | Asus | | ASUS | - |
| mobo x | MSI | | KINGSTON | - |
| memory | Kingston| | SAMSUNG | - |
-------------------- --------------------
so usually i just did
for(x in list1){
for(y in list2){
if(y.brand == x.brand){
y.check == true
}
}
}
is there any simple solution for that?
Since you're mutating the objects, it doesn't really get any cleaner than what you have. It can be done using any like this, but in my opinion is not any clearer to read:
list2.forEach { bar ->
bar.check = bar.check || list1.any { it.brand == bar.brand }
}
The above is slightly more efficient than what you have since it inverts the iteration of the two lists so you don't have to check every element of list1 unless it's necessary. The same could be done with yours like this:
for(x in list2){
for(y in list1){
if(y.brand == x.brand){
x.check = true
break
}
}
}
data class Item(val name: String, val brand: String)
fun main() {
val list1 = listOf(
Item("vga_x", "Asus"),
Item("vga_b", "Asus"),
Item("mobo_x", "MSI"),
Item("memory", "Kingston")
)
val list2 = listOf(
Item("", "MSI"),
Item("", "ASUS"),
Item("", "KINGSTON"),
Item("", "SAMSUNG")
)
// Get intersections
val intersections = list1.map{it.brand}.intersect(list2.map{it.brand})
println(intersections)
// Returns => [MSI]
// Has any intersections
val intersected = list1.map{it.brand}.any { it in list2.map{it.brand} }
println(intersected)
// Returns ==> true
}
UPDATE: I just see that this isn't a solution for your problem. But I'll leave it here.
I would like to use Spark to parse network messages and group them into logical entities in a stateful manner.
Problem Description
Let's assume each message is in one row of an input dataframe, depicted below.
| row | time | raw payload |
+-------+------+---------------+
| 1 | 10 | TEXT1; |
| 2 | 20 | TEXT2;TEXT3; |
| 3 | 30 | LONG- |
| 4 | 40 | TEXT1; |
| 5 | 50 | TEXT4;TEXT5;L |
| 6 | 60 | ONG |
| 7 | 70 | -TEX |
| 8 | 80 | T2; |
The task is to parse the logical messages in the raw payload, and provide them in a new output dataframe. In the example each logical message in the payload ends with a semicolon (delimiter).
The desired output dataframe could then look as follows:
| row | time | message |
+-------+------+---------------+
| 1 | 10 | TEXT1; |
| 2 | 20 | TEXT2; |
| 3 | 20 | TEXT3; |
| 4 | 30 | LONG-TEXT1; |
| 5 | 50 | TEXT4; |
| 6 | 50 | TEXT5; |
| 7 | 50 | LONG-TEXT2; |
Note that some messages rows do not yield a new row in the result (e.g. rows 4, 6,7,8), and some yield even multiple rows (e.g. rows 2, 5)
My questions:
is this a use case for UDAF? If so, how for example should i implement the merge function? i have no idea what its purpose is.
since the message ordering matters (i cannot process LONGTEXT-1, LONGTEXT-2 properly without respecting the message order), can i tell spark to parallelize perhaps on a higer level (e.g. per calendar day of messages) but not parallelize within a day (e.g. events at time 50,60,70,80 need to be processed in order).
follow up question: is it conceivable that the solution will be usable not just in traditional spark, but also in spark structured streaming? Or does the latter require its own kind of stateful processing method?
Generally, you can run arbitrary stateful aggregations on spark streaming by using mapGroupsWithState of flatMapGroupsWithState. You can find some examples here. None of those though will guarantee that the processing of the stream will be ordered by event time.
If you need to enforce data ordering, you should try to use window operations on event time. In that case, you need to run stateless operations instead, but if the number of elements in each window group is small enough, you can use collectList for instance and then apply a UDF (where you can manage the state for each window group) on each list.
ok i figured it out in the meantime how to do this with an UDAF.
class TagParser extends UserDefinedAggregateFunction {
override def inputSchema: StructType = StructType(StructField("value", StringType) :: Nil)
override def bufferSchema: StructType = StructType(
StructField("parsed", ArrayType(StringType)) ::
StructField("rest", StringType)
:: Nil)
override def dataType: DataType = ArrayType(StringType)
override def deterministic: Boolean = true
override def initialize(buffer: MutableAggregationBuffer): Unit = {
buffer(0) = IndexedSeq[String]()
buffer(1) = null
}
def doParse(str: String, buffer: MutableAggregationBuffer): Unit = {
buffer(0) = IndexedSeq[String]()
val prevRest = buffer(1)
var idx = -1
val strToParse = if (prevRest != null) prevRest + str else str
do {
val oldIdx = idx;
idx = strToParse.indexOf(';', oldIdx + 1)
if (idx == -1) {
buffer(1) = strToParse.substring(oldIdx + 1)
} else {
val newlyParsed = strToParse.substring(oldIdx + 1, idx)
buffer(0) = buffer(0).asInstanceOf[IndexedSeq[String]] :+ newlyParsed
buffer(1) = null
}
} while (idx != -1)
}
override def update(buffer: MutableAggregationBuffer, input: Row): Unit = {
if (buffer == null) {
return
}
doParse(input.getAs[String](0), buffer)
}
override def merge(buffer1: MutableAggregationBuffer, buffer2: Row): Unit = throw new UnsupportedOperationException
override def evaluate(buffer: Row): Any = buffer(0)
}
Here a demo app the uses the above UDAF to solve the problem from above:
case class Packet(time: Int, payload: String)
object TagParserApp extends App {
val spark, sc = ... // kept out for brevity
val df = sc.parallelize(List(
Packet(10, "TEXT1;"),
Packet(20, "TEXT2;TEXT3;"),
Packet(30, "LONG-"),
Packet(40, "TEXT1;"),
Packet(50, "TEXT4;TEXT5;L"),
Packet(60, "ONG"),
Packet(70, "-TEX"),
Packet(80, "T2;")
)).toDF()
val tp = new TagParser
val window = Window.rowsBetween(Window.unboundedPreceding, Window.currentRow)
val df2 = df.withColumn("msg", tp.apply(df.col("payload")).over(window))
df2.show()
}
this yields:
+----+-------------+--------------+
|time| payload| msg|
+----+-------------+--------------+
| 10| TEXT1;| [TEXT1]|
| 20| TEXT2;TEXT3;|[TEXT2, TEXT3]|
| 30| LONG-| []|
| 40| TEXT1;| [LONG-TEXT1]|
| 50|TEXT4;TEXT5;L|[TEXT4, TEXT5]|
| 60| ONG| []|
| 70| -TEX| []|
| 80| T2;| [LONG-TEXT2]|
+----+-------------+--------------+
the main issue for me was to figure out how to actually apply this UDAF, namely using this:
df.withColumn("msg", tp.apply(df.col("payload")).over(window))
the only thing i need now to figure out are the aspects of parallelization (which i only want to happen where we do not rely on ordering) but that's a separate issue for me.
I'm using .format() to autogenerate a menu. But I also need to format it as users run more tests to indicate those tests are already done.
Example test dict:
menuDict = {
"1":
{"testDataDict": "testDataDict1",
"testName": "testName1",
"testGroupName":"testGroupName1"},
"2":
{"testDataDict": "testDataDict2",
"testName": "testName2",
"testGroupName":"testGroupName2"
},
"3":
{"testDataDict": "testDataDict3",
"testName": "testName3",
"testGroupName":"testGroupName3"
},
"4":
{"testDataDict": "testDataDict4",
"testName": "testName4",
"testGroupName":"testGroupName3"
}
}
Actual code:
def menuAutoCreate(menuDict):
testGroupDict = {}
for testNum in menuDict.keys():
try:
testGroupDict[menuDict[testNum]["testGroupName"]].append(testNum)
except:
testGroupDict[menuDict[testNum]["testGroupName"]] = [testNum]
#Groups the tests under the group names
from natsort import natsorted as nt
testGroupNamesList = nt(testGroupDict.keys(), key=lambda y: y.lower())
#Naturally sorts group names so they look orderly
textDump = " "
i = 0
while i < len(testGroupNamesList):
howManyLinesEven = 0
evenList = []
howManyLinesOdd = 0
oddList = []
testGroupNameEven = testGroupNamesList[i]
textDump += "|{:44} |".format(testGroupNameEven)
howManyLinesEven = len(testGroupDict[testGroupNameEven])
evenList = nt(testGroupDict[testGroupNameEven], key=lambda y: y.lower())
#If it's an even number, it puts the menu template on the left side of the screen
if i != len(testGroupNamesList)-1:
testGroupNameOdd = testGroupNamesList[i+1]
textDump += "{:45} |".format(testGroupNameOdd) + "\n"
howManyLinesOdd = len(testGroupDict[testGroupNameOdd])
oddList = nt(testGroupDict[testGroupNameOdd], key=lambda y: y.lower())
#If it's odd, on the right side.
if i == len(testGroupNamesList)-1:
textDump += "{:45} |".format("") + "\n"
#Ensures everything is correctly whitespaced
howManyLines = max(howManyLinesEven, howManyLinesOdd)
#Checks how many lines there are, so if a group has less tests, it will have extra whitespaces
for line in range(howManyLines):
if line < howManyLinesEven:
data = {"testNum": evenList[line], "testName": menuDict[evenList[line]]["testName"]}
textDump += "|({d[testNum]}) {d[testName]:40} {{doneTests[{d[testNum]!r}]:^8}} |".format(d=data)
else:
textDump += "|{:44} |".format("")
if line < howManyLinesOdd:
data = {"testNum": oddList[line], "testName": menuDict[oddList[line]]["testName"]}
textDump += "({d[testNum]}) {d[testName]:41} {{doneTests[{d[testNum]!r}]:^8}} |".format(d=data) + "\n"
else:
textDump += "{:45} |".format("") + "\n"
#Automatically creates a menu
i += 2
print(textDump)
print("\n")
Output of this, as expected:
|testGroupName1 |testGroupName2 |
|(1) testName1 {doneTests['1']:^8} |(2) testName2 {doneTests['2']:^8} |
|testGroupName3 | |
|(3) testName3 {doneTests['3']:^8} | |
|(4) testName4 {doneTests['4']:^8} | | |
This last step will be done elsewhere, but put here for demonstration:
doneTests = {}
for testNum in menuDict.keys():
doneTests[testNum] = "(-)"
print(doneTests)
#textDump.format(**doneTests)
#This doesn't work for some reason?
textDump.format(doneTests = doneTests)
#This step will be repeated as the user does more tests, as an indicator of
which tests are completed.
The expected output would be this:
|testGroupName1 |testGroupName2 |
|(1) testName1 (-) |(2) testName2 (-) |
|testGroupName3 | |
|(3) testName3 (-) | |
|(4) testName4 (-) | | |
But here it throws a:
KeyError: "'1'"
If you remove !r from:
{{doneTests[{d[testNum]!r}]:^8}}
It throws a
KeyError: 1
instead.
I tried formatting with !s. Using lists/tuples. Adding and removing brackets. Out of ideas at this point...
Just tried your example.
I used the function sorted() instead of natsorted() and added the line
textDump = ''
to initialize the textDump variable before the line
i = 0
As a result I got no errors and got the expected output.
EDIT
Now I reproduced your error. I removed !r from {{doneTests[{d[testNum]!r}]:^8}} and used integer keys in doneTests variable
doneTests[int(testNum)] = "(-)"
to solve the problem. I guess the origin of the problem is how format() method works.
I am trying to compare if an array is a subset of other and use it in another query. I could get the comparision method working. However, if I use the compare method in another query I get an error saying "Left and right side of the relational operator must be scalars" This hints that the comparearrays is not reutrning a scalar. Any ideas?
let x = parsejson('["a", "b", "c"]');
let y = parsejson('["a", "b", "c"]');
let z = parsejson('["b","a"]');
let comparearrays = (arr1:dynamic, arr2:dynamic)
{
let arr1Length = arraylength(arr1);
let total =
range s from 0 to arr1Length-1 step 1
| project dat = iff(arr1[s] in (arr2), true , false)
| where dat == true
| count;
total | extend isEqual= iff(Count == arr1Length,'true','false') | project
tostring(isEqual)
};
//comparearrays(z, x)
datatable (i:int) [4] | project i | where comparearrays(x,y) == 'true'
You are correct in your understanding - the current implementation returns a table with a single row and single column, but have no fear - toscalar to the rescue:
let x = parsejson('["a", "b", "c"]');
let y = parsejson('["a", "b", "c"]');
let z = parsejson('["b","a"]');
let comparearrays = (arr1:dynamic, arr2:dynamic)
{
let arr1Length = arraylength(arr1);
let result =
range s from 0 to arr1Length-1 step 1
| project dat = iff(arr1[s] in (arr2), true , false)
| where dat == true
| count
| extend isEqual = iff(Count == arr1Length,'true','false')
| project tostring(isEqual);
toscalar(result)
};
//comparearrays(z, x)
datatable (i:int) [4] | project i | where comparearrays(x,y) == 'true'
You do have a bug in the comparearrays functions, since comparearrays(z, x) returns true which is not correct....
I am developing a Spark Streaming application where I want to have one global numeric ID per item in my data stream. Having an interval/RDD-local ID is trivial:
dstream.transform(_.zipWithIndex).map(_.swap)
This will result in a DStream like:
// key: 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 || 0 | 1 | 2 | 3 | 4 || 0
// val: a | b | c | d | e | f | g | h | i || j | k | l | m | n || o
(where the double bar || indicates the beginning of a new RDD).
What I finally want to have is:
// key: 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 || 9 | 10 | 11 | 12 | 13 || 14
// val: a | b | c | d | e | f | g | h | i || j | k | l | m | n || o
How can I do that in a safe and performant way?
This seems like a trivial task, but I feel it very hard to preserve state (state = "number of items seen so far") between RDDs. Here are two approaches I tried, updating the number of seen so far (plus the number in the current interval) using updateStateByKey with a bogus key:
val intervalItemCounts = inputStream.count().map((1, _))
// intervalItemCounts looks like:
// K: 1 || 1 || 1
// V: 9 || 5 || 1
val updateCountState: (Seq[Long], Option[ItemCount]) => Option[ItemCount] =
(itemCounts, maybePreviousState) => {
val previousState = maybePreviousState.getOrElse((0L, 0L))
val previousItemCount = previousState._2
Some((previousItemCount, previousItemCount + itemCounts.head))
}
val totalNumSeenItems: DStream[ItemCount] = intervalItemCounts.
updateStateByKey(updateCountState).map(_._2)
// totalNumSeenItems looks like:
// V: (0,9) || (9,14) || (14,15)
// The first approach uses a cartesian product with the
// 1-element state DStream. (Is this performant?)
val increaseRDDIndex1: (RDD[(Long, Char)], RDD[ItemCount]) =>
RDD[(Long, Char)] =
(streamData, totalCount) => {
val product = streamData.cartesian(totalCount)
product.map(dataAndOffset => {
val ((localIndex: Long, data: Char),
(offset: Long, _)) = dataAndOffset
(localIndex + offset, data)
})
}
val globallyIndexedItems1: DStream[(Long, Char)] = inputStream.
transformWith(totalNumSeenItems, increaseRDDIndex1)
// The second approach uses a take() output operation on the
// 1-element state DStream beforehand. (Is this valid?? Will
// the closure be serialized and shipped in every interval?)
val increaseRDDIndex2: (RDD[(Long, Char)], RDD[ItemCount]) =>
RDD[(Long, Char)] = (streamData, totalCount) => {
val offset = totalCount.take(1).head._1
streamData.map(keyValue => (keyValue._1 + offset, keyValue._2))
}
val globallyIndexedItems2: DStream[(Long, Char)] = inputStream.
transformWith(totalNumSeenItems, increaseRDDIndex2)
Both approaches give the correct result (with local[*] master), but I am wondering about performance (shuffle etc.), whether it works in a truly distributed environment and whether it shouldn't be a lot easier than that...