Compare two lists of lists and fill in blank values

Compare two lists of lists and fill in blank values - python-3.x

I'm reading data from an API and have a list of lists like this:
listData = [[datetime.datetime(2018, 1, 1, 5, 0), -6.78125],
[datetime.datetime(2018, 1, 1, 7, 0), -6.125],
[datetime.datetime(2018, 1, 1, 8, 0), -5.90625]]
I need to create a complete list filling in the missing values. I've created a destination, like this:
listDest = [[datetime.datetime(2018, 1, 1, 5, 0), None],
[datetime.datetime(2018, 1, 1, 6, 0), None],
[datetime.datetime(2018, 1, 1, 7, 0), None],
[datetime.datetime(2018, 1, 1, 8, 0), None]]
The end result should look like this:
[[datetime.datetime(2018, 1, 1, 5, 0), -6.78125],
[datetime.datetime(2018, 1, 1, 6, 0), None],
[datetime.datetime(2018, 1, 1, 7, 0), -6.125],
[datetime.datetime(2018, 1, 1, 8, 0), -5.90625]]
Here is the code I've tried:
for blankTime, blankValue in listDest:
for dataTime, dataValue in listData:
if blankTime == dataTime:
blankIndex = listDest.index(blankTime)
dataIndex = listData.index(dataTime)
listDest[blankIndex] = tempRm7[dataIndex]
This returns the following error, which is confusing since I know that value is in both lists.
ValueError: datetime.datetime(2018, 1, 1, 5, 0) is not in list
I attempted to adapt the methods in this answer but that's for a 1D list and I couldn't figure out how to make it work for my 2D list.

If both lists are sorted, you can merge them and then group them (using heapq.merge/itertools.groupby):
import datetime
from heapq import merge
from itertools import groupby
listData = [[datetime.datetime(2018, 1, 1, 5, 0), -6.78125],
[datetime.datetime(2018, 1, 1, 7, 0), -6.125],
[datetime.datetime(2018, 1, 1, 8, 0), -5.90625]]
listDest = [[datetime.datetime(2018, 1, 1, 5, 0), None],
[datetime.datetime(2018, 1, 1, 6, 0), None],
[datetime.datetime(2018, 1, 1, 7, 0), None],
[datetime.datetime(2018, 1, 1, 8, 0), None]]
out = [next(g) for _, g in groupby(merge(listData, listDest, key=lambda k: k[0]), lambda k: k[0])]
# pretty print to screen:
from pprint import pprint
pprint(out)
Prints:
[[datetime.datetime(2018, 1, 1, 5, 0), -6.78125],
[datetime.datetime(2018, 1, 1, 6, 0), None],
[datetime.datetime(2018, 1, 1, 7, 0), -6.125],
[datetime.datetime(2018, 1, 1, 8, 0), -5.90625]]

Related

Dynamic programming best sum code in python

I am trying to learn dynamic programming by followin an online video. The original video is using javascript and I am trying to use python to implement the same. However, I am not able to locate the error in my python implementation.
The question is as follows
write a fn. bestsum(targetsum, numbers) that takes in a targetsum and
an array of numbers as arguments.
The fn. should return an array containing the shortest combination of
numbers that add up to exactly the targetsum.
If there is a tie for the shortest combination, you may return any of
the shortest.
The javascript implementation is as follows.
const bestSum = (targetSum, numbers, memo={}) => {
if (targetSum in memo) return memo[targetSum];
if (targetSum === 0) return [];
if (targetSum < 0) return null;
let shortest_com = null;
for (let num of numbers) {
const remainder = targetSum - num;
const remainder_com = bestSum(remainder, numbers, memo);
if (remainder_com !== null) {
const combination = [...remainder_com, num];
if (shortest_com === null || combination.length < shortest_com.length) {
shortest_com = combination;
}
}
}
memo[targetSum] = shortest_com
return shortest_com;
};
console.log(bestSum(7, [5, 3, 4, 7]));
console.log(bestSum(8, [2, 3, 5]));
console.log(bestSum(8, [1, 4, 5]));
console.log(bestSum(100, [1, 2, 5, 25]));
Python code I implemented is
from typing import Any, Dict, List, Optional
def best_sum(target: int, numbers: List[int], memo:Dict[int, Any]={}) -> Optional[List[int]]:
if target in memo.keys():
return memo.get(target)
if target == 0:
return []
if target < 0:
return None
shortest_combination: Optional[List] = None
for num in numbers:
partial = best_sum(target=target - num, numbers=numbers, memo=memo)
if partial != None:
print(num)
partial.append(num)
if (shortest_combination == None) or (len(partial) < len(shortest_combination)):
shortest_combination = partial
memo[target] = shortest_combination
return shortest_combination
if __name__ == "__main__":
print(best_sum(target=100, numbers=[1, 2, 5, 25]))
For the test case: target=100, numbers=[1, 2, 5, 25].
Javascript implementation gives.
[ 25, 25, 25, 25 ]
But Python gives.
[25, 1, 1, 2, 1, 2, 1, 2, 1, 2, 5, 1, 2, 5, 1, 2, 5, 1, 2, 5, 1, 2, 5, 1, 2, 5, 1, 2, 5, 1, 2, 5, 1, 2, 5, 1, 2, 5, 1, 2, 5, 1, 2, 5, 1, 2, 5, 1, 2, 5, 1, 2, 5, 1, 2, 5, 1, 2, 5, 1, 2, 5, 1, 2, 5, 1, 2, 5, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25]

The problem is in this snippet:
if partial != None:
partial.append(num)
if (shortest_combination == None) or (len(partial) < len(shortest_combination)):
shortest_combination = partial
The Javascript appoach creates a copy of the list remainder_com with the element num appended. In your approach, you're appending to partial directly without creating a copy. Thus, in every iteration the same list will be used to modifications, which is not desired. Change it to
# Creates a copy of `partial` with `num` appended
combination = partial[:] + [num]
if (shortest_combination == None) or (len(combination) < len(shortest_combination)):
shortest_combination = combination
This outputs [25, 25, 25, 25] as expected.

how to find key-value pairs in a dictionary based on another dictionary keys with for loop?

I need to find the key-value pair in a dictionary, based on keys.
Could someone explain it how can it be done, please?
Sort the keys of word_freq ascendingly.
Please create a new dictionary called word_freq2 based on word_freq with the keys sorted ascendingly.
There are several ways to achieve that goal but many of the ways are beyond what we have covered so far in the course. There is one way that we'll describe employing what you have learned. Please feel free to use this way or any other way you want.
First, extract the keys of word_freq and convert it to a list called keys.
Sort the keys list.
Create an empty dictionary word_freq2.
I am not able to write the for loop for the below question. Any help would be highly appreciated
Use a FOR loop to iterate each value in keys. For each key iterated, find the corresponding value in word_freq and insert the key-value pair to word_freq2.
word_freq = {'love': 25, 'conversation': 1, 'every': 6, "we're": 1, 'plate': 1, 'sour': 1, 'jukebox': 1, 'now': 11, 'taxi': 1, 'fast': 1, 'bag': 1, 'man': 1, 'push': 3, 'baby': 14, 'going': 1, 'you': 16, "don't": 2, 'one': 1, 'mind': 2, 'backseat': 1, 'friends': 1, 'then': 3, 'know': 2, 'take': 1, 'play': 1, 'okay': 1, 'so': 2, 'begin': 1, 'start': 2, 'over': 1, 'body': 17, 'boy': 2, 'just': 1, 'we': 7, 'are': 1, 'girl': 2, 'tell': 1, 'singing': 2, 'drinking': 1, 'put': 3, 'our': 1, 'where': 1, "i'll": 1, 'all': 1, "isn't": 1, 'make': 1, 'lover': 1, 'get': 1, 'radio': 1, 'give': 1, "i'm": 23, 'like': 10, 'can': 1, 'doing': 2, 'with': 22, 'club': 1, 'come': 37, 'it': 1, 'somebody': 2, 'handmade': 2, 'out': 1, 'new': 6, 'room': 3, 'chance': 1, 'follow': 6, 'in': 27, 'may': 2, 'brand': 6, 'that': 2, 'magnet': 3, 'up': 3, 'first': 1, 'and': 23, 'pull': 3, 'of': 6, 'table': 1, 'much': 2, 'last': 3, 'i': 6, 'thrifty': 1, 'grab': 2, 'was': 2, 'driver': 1, 'slow': 1, 'dance': 1, 'the': 18, 'say': 2, 'trust': 1, 'family': 1, 'week': 1, 'date': 1, 'me': 10, 'do': 3, 'waist': 2, 'smell': 3, 'day': 6, 'although': 3, 'your': 21, 'leave': 1, 'want': 2, "let's": 2, 'lead': 6, 'at': 1, 'hand': 1, 'how': 1, 'talk': 4, 'not': 2, 'eat': 1, 'falling': 3, 'about': 1, 'story': 1, 'sweet': 1, 'best': 1, 'crazy': 2, 'let': 1, 'too': 5, 'van': 1, 'shots': 1, 'go': 2, 'to': 2, 'a': 8, 'my': 33, 'is': 5, 'place': 1, 'find': 1, 'shape': 6, 'on': 40, 'kiss': 1, 'were': 3, 'night': 3, 'heart': 3, 'for': 3, 'discovering': 6, 'something': 6, 'be': 16, 'bedsheets': 3, 'fill': 2, 'hours': 2, 'stop': 1, 'bar': 1}
keys = list(word_freq.keys()) #extract the keys of word_freq and convert it to a list called keys
print(keys)
for i in sorted (word_freq.keys()): #Sort the keys list.
print(i)
word_freq2 = {} #Create an empty dictionary word_freq2

It turns out to be a very simple solution after going through the Python Dictionary Examples and Methods
for value in keys:
word_freq2[value] = word_freq.get(value)
print(word_freq2)
Python Dictionary Examples and Methods.

Why do other values change in an ndarray when I try to change a specific cell value?

For example, I have a 3D ndarray of the shape (10,10,10) and whenever I try to change all the cells in this section [5,:,9] to a specific single value I end up changing values in this section too [4,:,9]. Which to me makes no sense. I do not get this behavior when I convert to a list of lists.
I use a simply for loop:
For i in range(0,10):
matrix[5,i, 9]= matrix[5,9,9]
Is there anyway to avoid this? I do not get this behavior when using a list of lists but I don’t wanna convert back and forth between the two as it takes too much processing time.

Doesn't happen that way for me:
In [232]: arr = np.ones((10,10,10),int)
In [233]: arr[5,9,9] = 10
In [234]: for i in range(10): arr[5,i,9]=arr[5,9,9]
In [235]: arr[5,:,9]
Out[235]: array([10, 10, 10, 10, 10, 10, 10, 10, 10, 10])
In [236]: arr[4,:,9]
Out[236]: array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
or assigning a whole "column" at once:
In [237]: arr[5,:,9] = np.arange(10)
In [239]: arr[5]
Out[239]:
array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 0],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 2],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 3],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 4],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 5],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 6],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 7],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 8],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 9]])

Plotting the frequency associated with bigrams

I have frequency of each bigrams of a dataset.I need to sort it by descending order and visualise the top n bigrams.This is my frequency associated with each bigrams
{('best', 'price'): 95, ('price', 'range'): 190, ('range', 'got'): 5, ('got', 'diwali'): 2, ('diwali', 'sale'): 2, ('sale', 'simply'): 1, ('simply', 'amazed'): 1, ('amazed', 'performance'): 1, ('performance', 'camera'): 30, ('camera', 'clarity'): 35, ('clarity', 'device'): 1, ('device', 'speed'): 1, ('speed', 'looks'): 1, ('looks', 'display'): 1, ('display', 'everything'): 2, ('everything', 'nice'): 5, ('nice', 'heats'): 2, ('heats', 'lot'): 14, ('lot', 'u'): 2, ('u', 'using'): 3, ('using', 'months'): 20, ('months', 'no'): 10, ('no', 'problems'): 8, ('problems', 'whatsoever'): 1, ('whatsoever', 'great'): 1
Can anyone help me visualise these bigrams?

If I understand you correctly, this is what you need
import seaborn as sns
bg_dict = {('best', 'price'): 95, ('price', 'range'): 190, ('range', 'got'): 5, ('got', 'diwali'): 2, ('diwali', 'sale'): 2, ('sale', 'simply'): 1,
('simply', 'amazed'): 1, ('amazed', 'performance'): 1, ('performance', 'camera'): 30, ('camera', 'clarity'): 35, ('clarity', 'device'): 1,
('device', 'speed'): 1, ('speed', 'looks'): 1, ('looks', 'display'): 1, ('display', 'everything'): 2, ('everything', 'nice'): 5, ('nice', 'heats'): 2, ('heats', 'lot'): 14,
('lot', 'u'): 2, ('u', 'using'): 3, ('using', 'months'): 20, ('months', 'no'): 10, ('no', 'problems'): 8, ('problems', 'whatsoever'): 1, ('whatsoever', 'great'): 1}
bg_dict_sorted = sorted(bg_dict.items(), key=lambda kv: kv[1], reverse=True)
bg, counts = list(zip(*bg_dict_sorted))
bg_str = list(map(lambda x: '-'.join(x), bg))
sns.barplot(bg_str, counts)

Strange Behaviour when Updating Cassandra row

I am using pyspark and pyspark-cassandra.
I have noticed this behaviour on multiple versions of Cassandra(3.0.x and 3.6.x) using COPY, sstableloader, and now saveToCassandra in pyspark.
I have the following schema
CREATE TABLE test (
id int,
time timestamp,
a int,
b int,
c int,
PRIMARY KEY ((id), time)
) WITH CLUSTERING ORDER BY (time DESC);
and the following data
(1, datetime.datetime(2015, 3, 1, 0, 18, 18, tzinfo=<UTC>), 1, 0, 0)
(1, datetime.datetime(2015, 3, 1, 0, 19, 12, tzinfo=<UTC>), 0, 1, 0)
(1, datetime.datetime(2015, 3, 1, 0, 22, 59, tzinfo=<UTC>), 1, 0, 0)
(1, datetime.datetime(2015, 3, 1, 0, 23, 52, tzinfo=<UTC>), 0, 1, 0)
(1, datetime.datetime(2015, 3, 1, 0, 32, 2, tzinfo=<UTC>), 1, 1, 0)
(1, datetime.datetime(2015, 3, 1, 0, 32, 8, tzinfo=<UTC>), 0, 2, 0)
(1, datetime.datetime(2015, 3, 1, 0, 43, 30, tzinfo=<UTC>), 1, 1, 0)
(1, datetime.datetime(2015, 3, 1, 0, 44, 12, tzinfo=<UTC>), 0, 2, 0)
(1, datetime.datetime(2015, 3, 1, 0, 48, 49, tzinfo=<UTC>), 1, 1, 0)
(1, datetime.datetime(2015, 3, 1, 0, 49, 7, tzinfo=<UTC>), 0, 2, 0)
(1, datetime.datetime(2015, 3, 1, 0, 50, 5, tzinfo=<UTC>), 1, 1, 0)
(1, datetime.datetime(2015, 3, 1, 0, 50, 53, tzinfo=<UTC>), 0, 2, 0)
(1, datetime.datetime(2015, 3, 1, 0, 51, 53, tzinfo=<UTC>), 1, 1, 0)
(1, datetime.datetime(2015, 3, 1, 0, 51, 59, tzinfo=<UTC>), 0, 2, 0)
(1, datetime.datetime(2015, 3, 1, 0, 54, 35, tzinfo=<UTC>), 1, 1, 0)
(1, datetime.datetime(2015, 3, 1, 0, 55, 28, tzinfo=<UTC>), 0, 2, 0)
(1, datetime.datetime(2015, 3, 1, 0, 55, 55, tzinfo=<UTC>), 1, 2, 0)
(1, datetime.datetime(2015, 3, 1, 0, 56, 24, tzinfo=<UTC>), 0, 3, 0)
(1, datetime.datetime(2015, 3, 1, 1, 11, 14, tzinfo=<UTC>), 1, 2, 0)
(1, datetime.datetime(2015, 3, 1, 1, 11, 17, tzinfo=<UTC>), 2, 1, 0)
(1, datetime.datetime(2015, 3, 1, 1, 12, 8, tzinfo=<UTC>), 1, 2, 0)
(1, datetime.datetime(2015, 3, 1, 1, 12, 10, tzinfo=<UTC>), 0, 3, 0)
(1, datetime.datetime(2015, 3, 1, 1, 17, 43, tzinfo=<UTC>), 1, 2, 0)
(1, datetime.datetime(2015, 3, 1, 1, 17, 49, tzinfo=<UTC>), 0, 3, 0)
(1, datetime.datetime(2015, 3, 1, 1, 24, 12, tzinfo=<UTC>), 1, 2, 0)
(1, datetime.datetime(2015, 3, 1, 1, 24, 18, tzinfo=<UTC>), 2, 1, 0)
(1, datetime.datetime(2015, 3, 1, 1, 24, 18, tzinfo=<UTC>), 1, 2, 0)
(1, datetime.datetime(2015, 3, 1, 1, 24, 24, tzinfo=<UTC>), 2, 1, 0)
Towards the end of the data, there are two rows which have the same timestamp.
(1, datetime.datetime(2015, 3, 1, 1, 24, 18, tzinfo=<UTC>), 2, 1, 0)
(1, datetime.datetime(2015, 3, 1, 1, 24, 18, tzinfo=<UTC>), 1, 2, 0)
It is my understanding that when I save to Cassandra, one of these will "win" - there will only be one row.
After writing to cassandra using
rdd.saveToCassandra(keyspace, table, ['id', 'time', 'a', 'b', 'c'])
Neither row appears to have won. Rather, the rows seem to have "merged".
1 | 2015-03-01 01:17:43+0000 | 1 | 2 | 0
1 | 2015-03-01 01:17:49+0000 | 0 | 3 | 0
1 | 2015-03-01 01:24:12+0000 | 1 | 2 | 0
1 | 2015-03-01 01:24:18+0000 | 2 | 2 | 0
1 | 2015-03-01 01:24:24+0000 | 2 | 1 | 0
Rather than the 2015-03-01 01:24:18+0000 containing (1, 2, 0) or (2, 1, 0), it contains (2, 2, 0).
What is happening here? I can't for the life of me figure out this behaviour is being caused.

This is a little known effect that comes from the batching together of data. Batching writes assigns the same timestamp to all Inserts in the batch. Next, if two writes are done with the exact same timestamp then there is a special merge rule since there was no "last" write. The Spark Cassandra Connector uses intra-partition batches by default so this is very likely to happen if you have this kind of clobbering of values.
The behavior with two identical write timestamps is a merge based on the Greater value.
Given Table (key, a, b)
Batch
Insert "foo", 2, 1
Insert "foo", 1, 2
End batch
The batch gives both mutations the same timestamp. Cassandra cannot chose a "last-written" since they both happened at the same time, instead it just chooses the greater value of the two. The merged result will be
"foo", 2, 2

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Compare two lists of lists and fill in blank values - python-3.x

Related

Dynamic programming best sum code in python

how to find key-value pairs in a dictionary based on another dictionary keys with for loop?

Why do other values change in an ndarray when I try to change a specific cell value?

Plotting the frequency associated with bigrams

Strange Behaviour when Updating Cassandra row

Categories

Resources