I'm getting trouble in using numpy and handling list - python-3.x

This code read CSV file line by line and counts the number on each Unicode but I can't understand two parts of code like below.I've already googled but I could't find the answer. Could you give me advice ?
1) Why should I use numpy here instead of []?
emoji_time = np.zeros(200)
2) What does -1 mean ?
emoji_time[len(emoji_list)-1] = 1 ```
This is the code result:
0x100039, 47,
0x10002D, 121,
0x100029, 30,
0x100078, 6,
unicode_count.py
import codecs
import re
import numpy as np
​
file0 = "./message.tsv"
f0 = codecs.open(file0, "r", "utf-8")
list0 = f0.readlines()
f0.close()
print(len(list0))
​
len_list = len(list0)
emoji_list = []
emoji_time = np.zeros(200)
​
for i in range(len_list):
a = "0x1000[0-9A-F][0-9A-F]"
if "0x1000" in list0[i]: # 0x and 0x1000: same nuumber
b = re.findall(a, list0[i])
# print(b)
for j in range(len(b)):
if b[j] not in emoji_list:
emoji_list.append(b[j])
emoji_time[len(emoji_list)-1] = 1
else:
c = emoji_list.index(b[j])
emoji_time[c] += 1
print(len(emoji_list))

1) If you use a list instead of a numpy array the result should not change in this case. You can try it for yourself running the same code but replacing emoji_time = np.zeros(200) with emoji_time = [0]*200.
2) emoji_time[len(emoji_list)-1] = 1. What this line is doing is the follow: If an emoji appears for the first time, 1 is add to emoji_time, which is the list that contains the amount of times one emoji occurred. len(emoji_list)-1 is used to set the position in emoji_time, and it is based on the length of emoji_list (the minus 1 is only needed because the list indexing in python starts from 0).

Related

How to extract many groups of cells separated by a specified number of rows in excel using python and write it to an other file?

I have a csv file which has around 58 million cells containing numerical data. I want to extract data from every 16 cells which are 49 rows apart.
Let me describe it clearly.
The data I need to extract
The above image shows the the first set of data that is to be extracted (rows 23 to 26, columns 92 to 95). This data has to be written in another file csv file (preferably in a row).
Then I will move down 49 rows (row 72), then extract 4rows x 4columns. Shown in image below.
Next set of data
Similarly, I need to keep going till I reach the end of the file.
Third set
The next set will be the image shown above.
I have to keep going till I reach the end of the file and extract thousands of such data.
I had written a code for this but its not working. I don't know where is the mistake. I will also attach it here.
import pandas as pd
import numpy
df = pd.read_csv('TS_trace31.csv')
# print(numpy.shape(df))
df = pd.read_csv('TS_trace31.csv')
# print(numpy.shape(df))
arrY = []
ex = 0
for i in range(len(df)):
if i == 0:
for j in range(4):
l = (df.iloc[j+21+i*(49), 91:95]).tolist()
arrY.append(l)
else:
for j in range(4):
if j+22+i*(49) >= len(df):
ex = 1
break
# print(j)
l = (df.iloc[j+21+i*(49), 91:95]).tolist()
arrY.append(l)
if ex == 1:
break
# print(arrY)
a = []
for i in range(len(arrY) - 3):
p = arrY[i]+arrY[i+1]+arrY[i+2]+arrY[i+3]
a.append(p)
print(numpy.shape(a))
numpy.savetxt('myfile.csv', a, delimiter=',')
Using the above code, I didn't get the result I wanted.
Please help with this and correct where I have gone wrong.
I couldn't attach my csv file here, Please try to use any sample sheet that you have or can create a simple one.
Thanks in advance! Have a great day.
i don't know what exactly you are doing in your code
but i wrote my own
import csv
from itertools import chain
CSV_PATH = 'TS_trace31.csv'
new_data = []
with open(CSV_PATH, 'r') as csvfile:
reader = csv.reader(csvfile)
# row_num for storing big jumps e.g. 23, 72, 121 ...
row_num = 23
# n for storing the group number 0 - 3
# with n we can find the 23, 24, 25, 26
n = 0
# row_group for storing every 4 group rows
row_group = []
# looping over every row in main file
for row in reader:
if reader.line_num == row_num + n:
# for the first time this is going to be 23 + 0
# then we add one number to the n
# so the next cycle will be 24 and so on
n += 1
print(reader.line_num)
# add each row to it group
row_group.append(row[91:95])
# check if we are at the end of the group e.g. 26
if n == 4:
# reset the group number
n = 0
# add the jump to main row number
row_num += 49
# combine all the row_group to a single row
new_data.append(list(chain(*row_group)))
# clear the row_group for next set of rows
row_group.clear()
print('='*50)
else:
continue
# and finally write all the rows in a new file
with open('myfile.csv', 'w') as new_csvfile:
writer = csv.writer(new_csvfile)
writer.writerows(new_data)

Reading Files in Parallel mpi4py

I have a series of n files that I'd like to read in parallel using mpi4py. Every file contains a column vector and, as final result, I want to obtain a matrix containing all the single vectors as X = [x1 x2 ... xn].
In the first part of the code I create the list containing all the names of the files and I distribute part of the list to the different cores through the scatter method.
import numpy as np
import pandas as pd
from mpi4py import MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
nprocs = comm.Get_size()
folder = "data/" # Input directory
files = [] # File List
# Create File List -----------------------------------------------------------
if rank == 0:
for i in range(1,2000):
filename = "file_" + str(i) + ".csv"
files = np.append(files,filename)
print("filelist complete!")
# Determine the size of each sub task
ave, res = divmod(files.size, nprocs)
counts = [ave + 1 if p < res else ave for p in range(nprocs)]
# Determine starting and ending indices of each sub-task
starts = [sum(counts[:p]) for p in range(nprocs)]
ends = [sum(counts[:p+1]) for p in range(nprocs)]
# Convert data into list of arrays
fileList = [files[starts[p]:ends[p]] for p in range(nprocs)]
else:
fileList = None
fileList = comm.scatter(fileList, root = 0)
Here I create a matrix X where to store the vectors.
# Variables Initialization ---------------------------------------------------
# Creation Support Vector
vector = pd.read_csv(folder + fileList[0])
vector = vector.values
vectorLength = len(vector)
# Matrix
X = np.ones((vectorLength, len(fileList)))
# ----------------------------------------------------------------------------
Here, I read the different files and I append the column vector to the matrix X. With the gather method I store all the X matrix calculated by the single cores into one single matrix X. The X matrix resulting from the gather method is a list of 2D numpy arrays. As final step, I reorganize the list X into a matrix
# Reading Files -----------------------------------------------------------
for i in range(len(fileList)):
data = pd.read_csv(folder + fileList[i])
data = np.array(data.values)
X[:,i] = data[:,0]
X = comm.gather(X, root = 0)
if rank == 0:
X_tot = np.empty((vectorLength, 1))
for i in range(nprocs):
X_proc = np.array(X[i])
X_tot = np.append(X_tot, X_proc, axis=1)
X_tot = X_tot[:,1:]
X = X_tot
del X_tot
print("printing X", X)
The code works fine. I tested it on a small dataset and did what it is meant to do. However I tried to run it on a large dataset and I got the following error:
X = comm.gather(X[:,1:], root = 0)
File "mpi4py/MPI/Comm.pyx", line 1578, in mpi4py.MPI.Comm.gather
File "mpi4py/MPI/msgpickle.pxi", line 773, in mpi4py.MPI.PyMPI_gather
File "mpi4py/MPI/msgpickle.pxi", line 778, in mpi4py.MPI.PyMPI_gather
File "mpi4py/MPI/msgpickle.pxi", line 191, in mpi4py.MPI.pickle_allocv
File "mpi4py/MPI/msgpickle.pxi", line 182, in mpi4py.MPI.pickle_alloc
SystemError: Negative size passed to PyBytes_FromStringAndSize
It seems a really general error, however I could process the same data in serial mode without problems or in parallel without using all the n files. I also noticed that only the rank 0 core seems to work, while the others seem to do nothing.
This is my first project using mpi4py so I'm sorry if the code is not perfect and if I have committed any conceptual mistake.
This error typically occurs when the data passed between MPI processes exceeds a certain size (I think 2GB). It's supposed to be fixed with future MPI versions, but for now, you'll probably have to resort to a workaround like storing your data on the hard disk and reading it with each process separately...
See for example here: https://github.com/mpi4py/mpi4py/issues/23

Write a program to take dictionary from the keyboard and print sum of values?

d =dict(input('Enter a dictionary'))
sum = 0
for i in d.values():
sum +=i
print(sum)
outputs: Enter a dictionary{'a': 100, 'b':200, 'c':300}
this is the problem arises:
Traceback (most recent call last):
File "G:/DurgaSoftPython/smath.py", line 2, in <module>
d =dict(input('Enter a dictionary'))
ValueError: dictionary update sequence element #0 has length 1; 2 is required
You can't create a dict from a string using the dict constructor, but you can use ast.literal_eval:
from ast import literal_eval
d = literal_eval(input('Enter a dictionary'))
s = 0 # don't name your variable `sum` (which is a built-in Python function
# you could've used to solve this problem)
for i in d.values():
s +=i
print(s)
Output:
Enter a dictionary{'a': 100, 'b':200, 'c':300}
600
Using sum:
d = literal_eval(input('Enter a dictionary'))
s = sum(d.values())
print(s)
import json
inp = input('Enter a dictionary')
inp = dict(json.loads(inp))
sum = sum(inp.values())
print(sum)
input Enter a dictionary{"a": 100, "b":200, "c":300}
output 600
Actually the return of input function is a string. So, in order to have a valid python dict you need to evaluate the input string and convert it into dict.
One way to do this can be done using literal_eval from ast package.
Here is an example:
from ast import literal_eval as le
d = le(input('Enter a dictionary: '))
_sum = 0
for i in d.values():
_sum +=i
print(_sum)
Demo:
Enter a dictionary: {'a': 100, 'b':200, 'c':300}
600
PS: Another way can be done using eval but it's not recommended.

How to append a loop value to a list?

I don't fathom why the output isn't a list...am I appending wrongly?
from numpy import *
b=0.1;g=0.5;l=632.8;p=2;I1=[I];I=0
for a in arange(-0.2,0.2,0.001):
I+=b**2*(sin(pi/l*b*sin(a)))**2/(pi/l*b*sin(a))**2*(sin(p*pi /l*g*sin(a)))**2/(sin(pi/l*g*sin(a)))**2
I1.append(I)
print (I)
output: 15.999998678557855
Several errors in your code, missing imports etc. See comments inside code for fixes:
from numpy import arange
from math import sin,pi
b = 0.1
g = 0.5
l = 632.8
p = 2
I = 0 # you need to specify I
I1 = [I] # before you can add it
for a in arange(-0.2,0.2,0.001):
I += b**2 * (sin(pi/l*b*sin(a)))**2 / (pi/l*b*sin(a))**2 * (sin(p*pi /l*g*sin(a)))**2 / (sin(pi/l*g*sin(a)))**2
I1.append(I) # by indenting this you move it _inside_ the loop
print (I)
print (I1)
Output:
15.999998678557855
[0, 0.03999999014218294, 0.07999998038139602, 0.1199999707171788, ....] # shortened

How do i sort a text file by column numerically?

from lxml import html
import operator
import discord
import yaml
import csv
raw_json =
requests.get('https://bittrex.com/api/v1.1/public/getmarketsummaries').text
json_dict = json.loads(raw_json)
stuff = json_dict["result"]
new = []
for i in range(0,197):
price = (stuff[i]['Last'])
name1 = (stuff[i]['MarketName'])
name = name1.replace("BTC-", "")
prev = (stuff[i]['PrevDay'])
diff = price - prev
change = round(((price - prev) / price) * 100, 2)
final = ('{0},{1}'.format(name,change))
new.append(final)
butFirst = new[0:]
this1 = ("\n".join(butFirst))
text_file = open("Sort.txt", "w")
text_file.write(this1)
text_file.close()
Im having problems sorting this output in second column..
I get base 10 errors.. integer errors etc.. i think the problem
is how the number is stored but i cant figure it out.
output looks like this>
1ST,-5.94
2GIVE,3.45
ABY,2.44
ADA,0.0
ADT,-4.87
ADX,-13.09
AEON,-2.86
AGRS,-2.0
You should avoid changing your data to text earlier than you need to. If you operate with a list of dictionaries it's very easy to sort the list.
import json
import csv
import requests
raw_json = requests.get('https://bittrex.com/api/v1.1/public/getmarketsummaries').text
json_dict = json.loads(raw_json)
stuff = json_dict["result"]
new = []
for i in range(0,197):
price = float(stuff[i]['Last'])
prev = float(stuff[i]['PrevDay'])
# Use dictionary to hold the data
d = {
'name' : stuff[i]['MarketName'].replace("BTC-", ""),
'change' : round(((price - prev) / price) * 100, 2)
}
new.append(d)
# The actual sorting part, sorting by change
sorted_list = sorted(new, key=lambda k: k['change'])
# Writing the dictionaries to file
with open("Sort.txt", "w") as text_file:
dict_writer = csv.DictWriter(text_file, sorted_list[0].keys())
# include the line below, if you want headers
# dict_writer.writeheader()
dict_writer.writerows(sorted_list)

Resources