How to correctly insert integers sqlite3 python - python-3.x

I need to insert rows from a excel file into a sqlite3 database i created ;
so far I managed i convert the excel into a dataframe I create the database , the table i wanted with the fields , i used a for loop to get my rows in the table through a "insert into tablename values (?..,?)" , (value1,...valuen) however only the date who got the text type is is clearly visible into the database , all the integers are passed into the database as bytes and even an int.from_bytes() don't get me my integers under the right form..
so if anyone can help
devices = df['id_device']
time = df['utc_datetime']
vote_yes = df['yes']
vote_neutre = df['neutre']
vote_no = df['no']
questions = ['question']*len(df)
kpi = ['KPI']*len(df)
id_status = [None]*len(df)
indexing = [index for index in range(len(df))]
base = list(map(lambda l,t,x,y,z,k,status , quest , index : [l,t.to_datetime(),x,y,z , k , status , quest , index] , devices , time , vote_yes , vote_neutre , vote_no , kpi , id_status , questions , indexing ))
base = [[507, datetime.datetime(2016, 8, 1, 11, 10, 30), 1, 0, 0, 'KPI', None, 'question', 0],
[507, datetime.datetime(2016, 8, 1, 11, 40, 33), 2, 0, 0, 'KPI', None, 'question', 1],
[507, datetime.datetime(2016, 8, 1, 12, 10, 39), 5, 3, 1, 'KPI', None, 'question', 2],
[507, datetime.datetime(2016, 8, 1, 13, 10, 43), 1, 0, 0, 'KPI', None, 'question', 3],
[507, datetime.datetime(2016, 8, 1, 14, 40, 43), 2, 1, 0, 'KPI', None, 'question', 4],
[507, datetime.datetime(2016, 8, 1, 15, 10, 47), 2, 0, 0, 'KPI', None, 'question', 5],
[507, datetime.datetime(2016, 8, 1, 16, 10, 47), 2, 0, 0, 'KPI', None, 'question', 6],
[507, datetime.datetime(2016, 8, 1, 16, 40, 51), 2, 1, 0, 'KPI', None, 'question', 7],
[507, datetime.datetime(2016, 8, 1, 17, 10, 56), 1, 2, 0, 'KPI', None, 'question', 8],
[507, datetime.datetime(2016, 8, 1, 17, 40, 57), 1, 0, 0, 'KPI', None, 'question', 9]]
cur = conn.cursor()
cur.execute('''create table if not exists coord4 (device int , time text)''')
for line in base:
cur.execute('''insert into coord4 values (?,?)''', (line[0], line[1]))
conn.commit()
res = cur.execute('select * from coord4')
print(res.fetchone())
#output
(b'\xfb\x01\x00\x00\x00\x00\x00\x00', '2016-08-01 11:10:30')
this is my code if you need..

The solution I was looking for was :
for line in base:
cur.execute('''insert into coord4 values (?,?)''', (int(line[0]), line[1]))
conn.commit()

Related

Create a torch tensor with desired values

I want to create a torch tensor of size 100 with values 10 and 100.
For example: The following gives a tensor of values between 5 and 6.
torch.randint(5,7,(100,))
tensor([6, 6, 6, 5, 5, 6, 6, 6, 6, 5, 6, 6, 6, 6, 6, 5, 6, 5, 5, 6, 5, 5, 5, 5,
6, 5, 5, 5, 5, 5, 6, 6, 6, 5, 6, 6, 5, 5, 5, 5, 6, 5, 5, 5, 5, 5, 6, 5,
5, 6, 5, 6, 5, 6, 5, 6, 6, 6, 6, 5, 6, 6, 6, 5, 5, 5, 6, 6, 6, 6, 5, 6,
5, 5, 5, 5, 6, 6, 5, 6, 6, 6, 5, 5, 6, 6, 5, 6, 6, 6, 5, 5, 5, 5, 5, 6,
6, 6, 5, 6])
Instead of this, I want a tensor with values 10 and 100 and I do not want the values between the integers 10 and 100. Tensor should just contain 10 and 100. How do I do that?
Thanks in advance.
If you sample from {0, 1} then a simple mapping from [0, 1] to [10, 100] will suffice
Here x -> (b-a)x + a = (100-10)x + 10 = 90x + 10 will work:
>>> rand = torch.randint(0, 2, (100,))
tensor([0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0,
0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0,
0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1,
0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1,
1, 1, 0, 0])
>>> 90*rand+10
tensor([ 10, 10, 100, 100, 100, 100, 10, 100, 100, 100, 100, 10, 10, 100,
100, 10, 100, 100, 10, 100, 100, 100, 10, 10, 10, 10, 10, 100,
10, 10, 100, 10, 100, 100, 10, 10, 100, 10, 10, 10, 10, 10,
100, 10, 10, 100, 10, 10, 10, 100, 100, 100, 100, 10, 100, 100,
10, 10, 100, 10, 100, 10, 100, 10, 100, 100, 10, 10, 10, 10,
100, 100, 10, 10, 100, 100, 10, 10, 10, 100, 10, 10, 100, 10,
100, 100, 10, 10, 100, 10, 100, 10, 10, 10, 100, 100, 100, 100,
10, 10])
You can achieve that by using the python function random.choice() to create a list of random numbers then convert it to a tensor:
import random
import torch
list_numbers = random.choices([100,10], k=100)
random_numbers = torch.Tensor(list_numbers)
print(random_numbers)

A calculation affects an identical (but different) variable in a stack elsewhere in python-3.x?

I am using a stack class to store 2d lists of strings and integers.
The lists serve as tables and I have the following code:
print('pushing')
print(lookup_table)
tables_to_be_tested.push(lookup_table)
print('new table')
print(lookup_table)
print('top of stack: ')
print(tables_to_be_tested.peek())
lookup_table[0][c2index] = c1_value
print('top of stack 2: ')
print(tables_to_be_tested.peek())
The line lookup_table[0][c2index] = c1_value only updates one value in the first list
Here is my output:
pushing
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [39, 50, 38, 53, 28, 37, 49, 52, 31, 42], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]
new table
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [39, 50, 38, 53, 28, 37, 49, 52, 31, 42], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]
top of stack:
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [39, 50, 38, 53, 28, 37, 49, 52, 31, 42], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]
top of stack 2:
[[0, 1, 2, 3, 4, 10, 6, 7, 8, 9], [39, 50, 38, 53, 28, 37, 49, 52, 31, 42], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]
The lists are created independently like this: lookup_table = [[],[],[]] and are appended to in a for loop.
The calculation should not affect the 2d list in the stack and yet it does. Why is this? What is a solution?

Plotting the frequency associated with bigrams

I have frequency of each bigrams of a dataset.I need to sort it by descending order and visualise the top n bigrams.This is my frequency associated with each bigrams
{('best', 'price'): 95, ('price', 'range'): 190, ('range', 'got'): 5, ('got', 'diwali'): 2, ('diwali', 'sale'): 2, ('sale', 'simply'): 1, ('simply', 'amazed'): 1, ('amazed', 'performance'): 1, ('performance', 'camera'): 30, ('camera', 'clarity'): 35, ('clarity', 'device'): 1, ('device', 'speed'): 1, ('speed', 'looks'): 1, ('looks', 'display'): 1, ('display', 'everything'): 2, ('everything', 'nice'): 5, ('nice', 'heats'): 2, ('heats', 'lot'): 14, ('lot', 'u'): 2, ('u', 'using'): 3, ('using', 'months'): 20, ('months', 'no'): 10, ('no', 'problems'): 8, ('problems', 'whatsoever'): 1, ('whatsoever', 'great'): 1
Can anyone help me visualise these bigrams?
If I understand you correctly, this is what you need
import seaborn as sns
bg_dict = {('best', 'price'): 95, ('price', 'range'): 190, ('range', 'got'): 5, ('got', 'diwali'): 2, ('diwali', 'sale'): 2, ('sale', 'simply'): 1,
('simply', 'amazed'): 1, ('amazed', 'performance'): 1, ('performance', 'camera'): 30, ('camera', 'clarity'): 35, ('clarity', 'device'): 1,
('device', 'speed'): 1, ('speed', 'looks'): 1, ('looks', 'display'): 1, ('display', 'everything'): 2, ('everything', 'nice'): 5, ('nice', 'heats'): 2, ('heats', 'lot'): 14,
('lot', 'u'): 2, ('u', 'using'): 3, ('using', 'months'): 20, ('months', 'no'): 10, ('no', 'problems'): 8, ('problems', 'whatsoever'): 1, ('whatsoever', 'great'): 1}
bg_dict_sorted = sorted(bg_dict.items(), key=lambda kv: kv[1], reverse=True)
bg, counts = list(zip(*bg_dict_sorted))
bg_str = list(map(lambda x: '-'.join(x), bg))
sns.barplot(bg_str, counts)

Get the list of RGB pixel values of each superpixel

l have an RGB image of dimension (224,224,3). l applied superpixel segmentation on it using SLIC algorithm.
As follow :
img= skimageIO.imread("first_image.jpeg")
print('img shape', img.shape) # (224,224,3)
segments_slic = slic(img, n_segments=1000, compactness=0.01, sigma=1) # Up to 1000 segments
segments_slic.shape
(224,224)
Number of returned segments are :
np.max(segments_slic)
Out[49]: 595
From 0 to 595. So, we have 596 superpixels (regions).
Let's take a look at segments_slic[0]
segments_slic[0]
Out[51]:
array([ 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5,
5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7,
8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9,
10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 12, 12,
12, 12, 12, 12, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 14, 14, 14,
14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 16, 16, 16,
16, 16, 16, 16, 17, 17, 17, 17, 17, 17, 17, 17, 17, 18, 18, 18, 18,
18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 20, 20,
20, 20, 20, 20, 20, 20, 20, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
21, 21, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 23,
23, 23, 23, 23, 23, 23, 24, 24, 24, 24, 24, 24, 24, 25, 25, 25, 25,
25, 25, 25])
What l would like to get ?
for each superpixel region make two arrays as follow:
1) Array : contain the indexes of the pixels belonging to the same superpixel.
For instance
superpixel_list[0] contains all the indexes of the pixels belonging to superpixel 0 .
superpixel_list[400] contains all the indexes of the pixels belonging to superpixel 400
2)superpixel_pixel_values[0] : contains the pixel values (in RGB) of the pixels belonging to superpixel 0.
For instance, let's say that pixels 0, 24 , 29, 53 belongs to the superpixel 0. Then we get
superpixel[0]= [[223,118,33],[245,222,198],[98,17,255],[255,255,0]]# RGB values of pixels belonging to superpixel 0
What is the efficient/optimized way to do that ? (Because l have l dataset of images to loop over)
EDIT-1
def sp_idx(s, index = True):
u = np.unique(s)
if index:
return [np.where(s == i) for i in u]
else:
return [s[s == i] for i in u]
#return [s[np.where(s == i)] for i in u] gives the same but is slower
superpixel_list = sp_idx(segments_slic)
superpixel = sp_idx(segments_slic, index = False)
In superpixel_list we are supposed to get a list containing the index of pixels belonging to the same superpixel.
For instance
superpixel_list[0] is supposed to get all the pixel indexes of the pixel affected to superpixel 0
however l get the following :
superpixel_list[0]
Out[73]:
(array([ 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2,
3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5,
5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7,
7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 10,
10, 10, 10, 10, 11, 11, 11, 11, 12, 12, 12, 12, 13, 13, 13]),
array([0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5,
6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5, 6,
7, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 0, 1,
2, 3, 4, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2]))
Why two arrays ?
In superpixel[0] for instance we are supposed to get the RGB pixel values of each pixel affected to supepixel 0 as follow :
for instance pixels 0, 24 , 29, 53 are affected to superpixel 0 then :
superpixel[0]= [[223,118,33],[245,222,198],[98,17,255],[255,255,0]]
However when l use your function l get the following :
superpixel[0]
Out[79]:
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
Thank you for your help
Can be done using np.where and the resulting indices.
def sp_idx(s, index = True):
u = np.unique(s)
return [np.where(s == i) for i in u]
superpixel_list = sp_idx(segments_slic)
superpixel = [img[idx] for idx in superpixel_list]

NumPy array of integers to timedelta

I have a numpy array of milliseconds in integers, which I want to convert to an array of Python datetimes via a timedelta operation.
The following MWE works, but I'm convinced there is a more elegant approach or with better performence than multiplication by 1 ms.
start = pd.Timestamp('2016-01-02 03:04:56.789101').to_pydatetime()
dt = np.array([ 19, 14980, 19620, 54964615, 54964655, 86433958])
time_arr = start + dt * timedelta(milliseconds=1)
So your approach produces:
In [56]: start = pd.Timestamp('2016-01-02 03:04:56.789101').to_pydatetime()
In [57]: start
Out[57]: datetime.datetime(2016, 1, 2, 3, 4, 56, 789101)
In [58]: dt = np.array([ 19, 14980, 19620, 54964615, 54964655, 86433958])
In [59]: time_arr = start + dt * timedelta(milliseconds=1)
In [60]: time_arr
Out[60]:
array([datetime.datetime(2016, 1, 2, 3, 4, 56, 808101),
datetime.datetime(2016, 1, 2, 3, 5, 11, 769101),
datetime.datetime(2016, 1, 2, 3, 5, 16, 409101),
datetime.datetime(2016, 1, 2, 18, 21, 1, 404101),
datetime.datetime(2016, 1, 2, 18, 21, 1, 444101),
datetime.datetime(2016, 1, 3, 3, 5, 30, 747101)], dtype=object)
The equivalent using np.datetime64 types:
In [61]: dt.astype('timedelta64[ms]')
Out[61]: array([ 19, 14980, 19620, 54964615, 54964655, 86433958], dtype='timedelta64[ms]')
In [62]: np.datetime64(start)
Out[62]: numpy.datetime64('2016-01-02T03:04:56.789101')
In [63]: np.datetime64(start) + dt.astype('timedelta64[ms]')
Out[63]:
array(['2016-01-02T03:04:56.808101', '2016-01-02T03:05:11.769101',
'2016-01-02T03:05:16.409101', '2016-01-02T18:21:01.404101',
'2016-01-02T18:21:01.444101', '2016-01-03T03:05:30.747101'], dtype='datetime64[us]')
I can produce the same array from your time_arr with np.array(time_arr, dtype='datetime64[us]').
tolist converts these datetime64 items to datetime objects:
In [97]: t1=np.datetime64(start) + dt.astype('timedelta64[ms]')
In [98]: t1.tolist()
Out[98]:
[datetime.datetime(2016, 1, 2, 3, 4, 56, 808101),
datetime.datetime(2016, 1, 2, 3, 5, 11, 769101),
datetime.datetime(2016, 1, 2, 3, 5, 16, 409101),
datetime.datetime(2016, 1, 2, 18, 21, 1, 404101),
datetime.datetime(2016, 1, 2, 18, 21, 1, 444101),
datetime.datetime(2016, 1, 3, 3, 5, 30, 747101)]
or wrap it back in an array to get your time_arr:
In [99]: np.array(t1.tolist())
Out[99]:
array([datetime.datetime(2016, 1, 2, 3, 4, 56, 808101),
...
datetime.datetime(2016, 1, 3, 3, 5, 30, 747101)], dtype=object)
Just for the calculation datatime64 is faster, but with the conversions, it may not be the fastest overall.
https://docs.scipy.org/doc/numpy/reference/arrays.datetime.html

Resources