NumPy array of integers to timedelta - python-3.x

I have a numpy array of milliseconds in integers, which I want to convert to an array of Python datetimes via a timedelta operation.
The following MWE works, but I'm convinced there is a more elegant approach or with better performence than multiplication by 1 ms.
start = pd.Timestamp('2016-01-02 03:04:56.789101').to_pydatetime()
dt = np.array([ 19, 14980, 19620, 54964615, 54964655, 86433958])
time_arr = start + dt * timedelta(milliseconds=1)

So your approach produces:
In [56]: start = pd.Timestamp('2016-01-02 03:04:56.789101').to_pydatetime()
In [57]: start
Out[57]: datetime.datetime(2016, 1, 2, 3, 4, 56, 789101)
In [58]: dt = np.array([ 19, 14980, 19620, 54964615, 54964655, 86433958])
In [59]: time_arr = start + dt * timedelta(milliseconds=1)
In [60]: time_arr
Out[60]:
array([datetime.datetime(2016, 1, 2, 3, 4, 56, 808101),
datetime.datetime(2016, 1, 2, 3, 5, 11, 769101),
datetime.datetime(2016, 1, 2, 3, 5, 16, 409101),
datetime.datetime(2016, 1, 2, 18, 21, 1, 404101),
datetime.datetime(2016, 1, 2, 18, 21, 1, 444101),
datetime.datetime(2016, 1, 3, 3, 5, 30, 747101)], dtype=object)
The equivalent using np.datetime64 types:
In [61]: dt.astype('timedelta64[ms]')
Out[61]: array([ 19, 14980, 19620, 54964615, 54964655, 86433958], dtype='timedelta64[ms]')
In [62]: np.datetime64(start)
Out[62]: numpy.datetime64('2016-01-02T03:04:56.789101')
In [63]: np.datetime64(start) + dt.astype('timedelta64[ms]')
Out[63]:
array(['2016-01-02T03:04:56.808101', '2016-01-02T03:05:11.769101',
'2016-01-02T03:05:16.409101', '2016-01-02T18:21:01.404101',
'2016-01-02T18:21:01.444101', '2016-01-03T03:05:30.747101'], dtype='datetime64[us]')
I can produce the same array from your time_arr with np.array(time_arr, dtype='datetime64[us]').
tolist converts these datetime64 items to datetime objects:
In [97]: t1=np.datetime64(start) + dt.astype('timedelta64[ms]')
In [98]: t1.tolist()
Out[98]:
[datetime.datetime(2016, 1, 2, 3, 4, 56, 808101),
datetime.datetime(2016, 1, 2, 3, 5, 11, 769101),
datetime.datetime(2016, 1, 2, 3, 5, 16, 409101),
datetime.datetime(2016, 1, 2, 18, 21, 1, 404101),
datetime.datetime(2016, 1, 2, 18, 21, 1, 444101),
datetime.datetime(2016, 1, 3, 3, 5, 30, 747101)]
or wrap it back in an array to get your time_arr:
In [99]: np.array(t1.tolist())
Out[99]:
array([datetime.datetime(2016, 1, 2, 3, 4, 56, 808101),
...
datetime.datetime(2016, 1, 3, 3, 5, 30, 747101)], dtype=object)
Just for the calculation datatime64 is faster, but with the conversions, it may not be the fastest overall.
https://docs.scipy.org/doc/numpy/reference/arrays.datetime.html

Related

Dynamic programming best sum code in python

I am trying to learn dynamic programming by followin an online video. The original video is using javascript and I am trying to use python to implement the same. However, I am not able to locate the error in my python implementation.
The question is as follows
write a fn. bestsum(targetsum, numbers) that takes in a targetsum and
an array of numbers as arguments.
The fn. should return an array containing the shortest combination of
numbers that add up to exactly the targetsum.
If there is a tie for the shortest combination, you may return any of
the shortest.
The javascript implementation is as follows.
const bestSum = (targetSum, numbers, memo={}) => {
if (targetSum in memo) return memo[targetSum];
if (targetSum === 0) return [];
if (targetSum < 0) return null;
let shortest_com = null;
for (let num of numbers) {
const remainder = targetSum - num;
const remainder_com = bestSum(remainder, numbers, memo);
if (remainder_com !== null) {
const combination = [...remainder_com, num];
if (shortest_com === null || combination.length < shortest_com.length) {
shortest_com = combination;
}
}
}
memo[targetSum] = shortest_com
return shortest_com;
};
console.log(bestSum(7, [5, 3, 4, 7]));
console.log(bestSum(8, [2, 3, 5]));
console.log(bestSum(8, [1, 4, 5]));
console.log(bestSum(100, [1, 2, 5, 25]));
Python code I implemented is
from typing import Any, Dict, List, Optional
def best_sum(target: int, numbers: List[int], memo:Dict[int, Any]={}) -> Optional[List[int]]:
if target in memo.keys():
return memo.get(target)
if target == 0:
return []
if target < 0:
return None
shortest_combination: Optional[List] = None
for num in numbers:
partial = best_sum(target=target - num, numbers=numbers, memo=memo)
if partial != None:
print(num)
partial.append(num)
if (shortest_combination == None) or (len(partial) < len(shortest_combination)):
shortest_combination = partial
memo[target] = shortest_combination
return shortest_combination
if __name__ == "__main__":
print(best_sum(target=100, numbers=[1, 2, 5, 25]))
For the test case: target=100, numbers=[1, 2, 5, 25].
Javascript implementation gives.
[ 25, 25, 25, 25 ]
But Python gives.
[25, 1, 1, 2, 1, 2, 1, 2, 1, 2, 5, 1, 2, 5, 1, 2, 5, 1, 2, 5, 1, 2, 5, 1, 2, 5, 1, 2, 5, 1, 2, 5, 1, 2, 5, 1, 2, 5, 1, 2, 5, 1, 2, 5, 1, 2, 5, 1, 2, 5, 1, 2, 5, 1, 2, 5, 1, 2, 5, 1, 2, 5, 1, 2, 5, 1, 2, 5, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25, 1, 2, 5, 25]
The problem is in this snippet:
if partial != None:
partial.append(num)
if (shortest_combination == None) or (len(partial) < len(shortest_combination)):
shortest_combination = partial
The Javascript appoach creates a copy of the list remainder_com with the element num appended. In your approach, you're appending to partial directly without creating a copy. Thus, in every iteration the same list will be used to modifications, which is not desired. Change it to
# Creates a copy of `partial` with `num` appended
combination = partial[:] + [num]
if (shortest_combination == None) or (len(combination) < len(shortest_combination)):
shortest_combination = combination
This outputs [25, 25, 25, 25] as expected.

A calculation affects an identical (but different) variable in a stack elsewhere in python-3.x?

I am using a stack class to store 2d lists of strings and integers.
The lists serve as tables and I have the following code:
print('pushing')
print(lookup_table)
tables_to_be_tested.push(lookup_table)
print('new table')
print(lookup_table)
print('top of stack: ')
print(tables_to_be_tested.peek())
lookup_table[0][c2index] = c1_value
print('top of stack 2: ')
print(tables_to_be_tested.peek())
The line lookup_table[0][c2index] = c1_value only updates one value in the first list
Here is my output:
pushing
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [39, 50, 38, 53, 28, 37, 49, 52, 31, 42], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]
new table
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [39, 50, 38, 53, 28, 37, 49, 52, 31, 42], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]
top of stack:
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [39, 50, 38, 53, 28, 37, 49, 52, 31, 42], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]
top of stack 2:
[[0, 1, 2, 3, 4, 10, 6, 7, 8, 9], [39, 50, 38, 53, 28, 37, 49, 52, 31, 42], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]
The lists are created independently like this: lookup_table = [[],[],[]] and are appended to in a for loop.
The calculation should not affect the 2d list in the stack and yet it does. Why is this? What is a solution?

Swap pair of elements along an axis

I have a 2d numpy array as such:
import numpy as np
a = np.arange(20).reshape((2,10))
# array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
# [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]])
I want to swap pairs of elements in each row. The desired output looks like this:
# array([[ 9, 0, 2, 1, 4, 3, 6, 5, 8, 7],
# [19, 10, 12, 11, 14, 13, 16, 15, 18, 17]])
I managed to find a solution in 1d:
a = np.arange(10)
# does the job for all pairs except the first
output = np.roll(np.flip(np.roll(a,-1).reshape((-1,2)),1).flatten(),2)
# first pair done manually
output[0] = a[-1]
output[1] = a[0]
Any ideas on a "numpy only" solution for the 2d case ?
Owing to the first pair not exactly subscribing to the usual pair swap, we can do that separately. For the rest, it would relatively straight-forward with reshaping to split axes and flip axis. Hence, it would be -
In [42]: a # 2D input array
Out[42]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]])
In [43]: b2 = a[:,1:-1].reshape(a.shape[0],-1,2)[...,::-1].reshape(a.shape[0],-1)
In [44]: np.hstack((a[:,[-1,0]],b2))
Out[44]:
array([[ 9, 0, 2, 1, 4, 3, 6, 5, 8, 7],
[19, 10, 12, 11, 14, 13, 16, 15, 18, 17]])
Alternatively, stack and then reshape+flip-axis -
In [50]: a1 = np.hstack((a[:,[0,-1]],a[:,1:-1]))
In [51]: a1.reshape(a.shape[0],-1,2)[...,::-1].reshape(a.shape[0],-1)
Out[51]:
array([[ 9, 0, 2, 1, 4, 3, 6, 5, 8, 7],
[19, 10, 12, 11, 14, 13, 16, 15, 18, 17]])

Plotting the frequency associated with bigrams

I have frequency of each bigrams of a dataset.I need to sort it by descending order and visualise the top n bigrams.This is my frequency associated with each bigrams
{('best', 'price'): 95, ('price', 'range'): 190, ('range', 'got'): 5, ('got', 'diwali'): 2, ('diwali', 'sale'): 2, ('sale', 'simply'): 1, ('simply', 'amazed'): 1, ('amazed', 'performance'): 1, ('performance', 'camera'): 30, ('camera', 'clarity'): 35, ('clarity', 'device'): 1, ('device', 'speed'): 1, ('speed', 'looks'): 1, ('looks', 'display'): 1, ('display', 'everything'): 2, ('everything', 'nice'): 5, ('nice', 'heats'): 2, ('heats', 'lot'): 14, ('lot', 'u'): 2, ('u', 'using'): 3, ('using', 'months'): 20, ('months', 'no'): 10, ('no', 'problems'): 8, ('problems', 'whatsoever'): 1, ('whatsoever', 'great'): 1
Can anyone help me visualise these bigrams?
If I understand you correctly, this is what you need
import seaborn as sns
bg_dict = {('best', 'price'): 95, ('price', 'range'): 190, ('range', 'got'): 5, ('got', 'diwali'): 2, ('diwali', 'sale'): 2, ('sale', 'simply'): 1,
('simply', 'amazed'): 1, ('amazed', 'performance'): 1, ('performance', 'camera'): 30, ('camera', 'clarity'): 35, ('clarity', 'device'): 1,
('device', 'speed'): 1, ('speed', 'looks'): 1, ('looks', 'display'): 1, ('display', 'everything'): 2, ('everything', 'nice'): 5, ('nice', 'heats'): 2, ('heats', 'lot'): 14,
('lot', 'u'): 2, ('u', 'using'): 3, ('using', 'months'): 20, ('months', 'no'): 10, ('no', 'problems'): 8, ('problems', 'whatsoever'): 1, ('whatsoever', 'great'): 1}
bg_dict_sorted = sorted(bg_dict.items(), key=lambda kv: kv[1], reverse=True)
bg, counts = list(zip(*bg_dict_sorted))
bg_str = list(map(lambda x: '-'.join(x), bg))
sns.barplot(bg_str, counts)

Load data from file and normalize

How to normalize data loaded from file? Here what I have. Data looks kind of like this:
65535, 3670, 65535, 3885, -0.73, 1
65535, 3962, 65535, 3556, -0.72, 1
Last value in each line is a target. I want to have the same structure of the data but with normalized values.
import numpy as np
dataset = np.loadtxt('infrared_data.txt', delimiter=',')
# select first 5 columns as the data
X = dataset[:, 0:5]
# is that correct? Should I normalize along 0 axis?
normalized_X = preprocessing.normalize(X, axis=0)
y = dataset[:, 5]
Now the question is, how to pack correctly normalized_X and y back, that it has the structure:
dataset = [[normalized_X[0], y[0]],[normalized_X[1], y[1]],...]
It sounds like you're asking for np.column_stack. For example, let's set up some dummy data:
import numpy as np
x = np.arange(25).reshape(5, 5)
y = np.arange(5) + 1000
Which gives us:
X:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
Y:
array([1000, 1001, 1002, 1003, 1004])
And we want:
new = np.column_stack([x, y])
Which gives us:
New:
array([[ 0, 1, 2, 3, 4, 1000],
[ 5, 6, 7, 8, 9, 1001],
[ 10, 11, 12, 13, 14, 1002],
[ 15, 16, 17, 18, 19, 1003],
[ 20, 21, 22, 23, 24, 1004]])
If you'd prefer less typing, you can also use:
In [4]: np.c_[x, y]
Out[4]:
array([[ 0, 1, 2, 3, 4, 1000],
[ 5, 6, 7, 8, 9, 1001],
[ 10, 11, 12, 13, 14, 1002],
[ 15, 16, 17, 18, 19, 1003],
[ 20, 21, 22, 23, 24, 1004]])
However, I'd discourage using np.c_ for anything other than interactive use, simply due to readability concerns.

Resources