I am generating some gradient pic with 10 bit data, and I have to mix it to 16 bit for compression purpose. I use this code but it's very slow. Is there any better way to do it?
def gradient_pic(num,f):
pic = np.zeros(2580,dtype= np.uint8)
pixvalue = np.random.randint(0,50,size=1,dtype=np.uint8)[0]
for i in range (0,1544):
for chunk in range(0,258):
pic[chunk*5] = (pixvalue<<6)+((pixvalue+1)>>4)
pic[chunk*5+1] = (((pixvalue+1)&15)<<12) + ((pixvalue+2)<<2)+ ((pixvalue+3)>>8)
pic[chunk*5+2] = (((pixvalue+3)&255)<<8) + ((pixvalue+4)>>2)
pic[chunk*5+3] = (((pixvalue+4)&3)<<14) + ((pixvalue+5)<<4)+ ((pixvalue+6)>>6)
pic[chunk*5+4] = (((pixvalue+6)&63)<<10) + pixvalue+7
pixvalue = (pixvalue+8)%1023
f.write(pic)
print(num)
Related
I am trying to use julai as main language for my work. But I find that this plot is different than python (Which outputs the right plot)
Here is the python code and output
import numpy as np
import math
import matplotlib.pyplot as plt
u = 9.27*10**(-21)
k = 1.38*10**(-16)
j2 = 7/2
nrr = 780
h = 1000
na = 6*10**(23)
rho = 7.842
mgd = 157.25
a = mgd
d = na*rho*u/a
m_f = []
igd = 7.0
for t in range(1,401):
while True:
h1 = h+d*nrr*igd
x2 = (7*u*h1)/(k*t)
x4 = 2*j2
q2 = (x4+1)/x4
m = abs(7*(q2*math.tanh(q2*x2)**-1 - (1/x4)*math.tanh(x2/x4)**-1))
if abs(m - igd) < 10**(-12):
break
else:
igd = m
m_f.append(abs(m))
plt.plot(range(1,401), m_f)
plt.savefig("Py_plot.pdf")
and it gives the following right plot
The right plot as expected
But when I do the same calculations in julia it gives different output than python, here is my julia code
using Plots
u = 9.27*10^(-21)
k = 1.38*10^(-16)
j2 = 7/2
nrr = 780
h = 1000
na = 6*10^(23)
rho = 7.842
mgd = 157.25
a = mgd
d = na*rho*u/a
igd = 7.0
m = 0.0
m_f = Float64[]
for t in 1:400
while true
h1 = h+d*nrr*igd
x2 = (7*u*h1)/(k*t)
x4 = 2*j2
q2 = (x4+1)/x4
m = 7*(q2*coth(rad2deg(q2*x2))-(1/x4)*coth(rad2deg(x2/x4)))
if abs(abs(m)-igd) < 10^(-10)
break
else
igd = m
end
end
push!(m_f, abs(m))
end
plot(1:400, m_f)
and this is the unexpected julia output
unexpected wrong output from julia
I wish for help....
Code:
using Plots
const u = 9.27e-21
const k = 1.38e-16
const j2 = 7/2
const nrr = 780
const h = 1000
const na = 6.0e23
const rho = 7.842
const mgd = 157.25
const a = mgd
const d = na*rho*u/a
function plot_graph()
igd = 7.0
m = 0.0
trange = 1:400
m_f = Vector{Float64}(undef, length(trange))
for t in trange
while true
h1 = h+d*nrr*igd
x2 = (7*u*h1)/(k*t)
x4 = 2*j2
q2 = (x4+1)/x4
m = abs(7*(q2*coth(q2*x2)-(1/x4)*coth(x2/x4)))
if isapprox(m, igd, atol = 10^(-10))
break
else
igd = m
end
end
m_f[t] = m
end
plot(trange, m_f)
end
Plot:
Changes for correctness:
Changed na = 6.0*10^(23) to na = 6.0e23.
Since ^ has a higher precedence than *, 10^23 is evaluated first, and since the operands are Int values, the result is also an Int. However, Int (i.e. Int64) can only hold numbers up to approximately 9 * 10^18, so 10^23 overflows and gives a wrong result.
julia> 10^18
1000000000000000000
julia> 10^19 #overflow starts here
-8446744073709551616
julia> 10^23 #and gives a wrong value here too
200376420520689664
6.0e23 avoids this problem by directly using the scientific e-notation to create a literal Float64 value (Float64 can hold this value without overflowing).
Removed the rad2deg calls when calling coth. Julia trigonometric functions by default take radians, so there's no need to make this conversion.
Other changes
Marked all the constants as const, and moved the rest of the code into a function. See Performance tip: Avoid non-constant global variables
Changed the abs(m - igd) < 10^-10 to isapprox(m, igd, atol = 10^-10) which performs basically the same check, but is clearer and more flexible (for eg. if you wanted to change to a relative tolerance rtol later).
Stored the 1:400 as a named variable trange. This is just because it's used multiple times, so it's easier to manage as a variable.
Changed m_f = Float64[] to m_f = Vector{Float64}(undef, length(trange)) (and the push! at the end to an assignment). If the size of the array is known beforehand (as it is in this case), it's better for performance to pre-allocate it with undef values and then assign to it.
Changed u and k also to use the scientific e-notation, for consistency and clarity (thanks to #DNF for suggesting the use of this notation in the comments).
I have estimated nested logit in R using the mlogit package. However, I encountered some problems when trying to estimate the marginal effect. Below is the code I implemented.
library(mlogit)
# data
data2 = read.csv(file = "neat_num_energy.csv")
new_ener2 <- mlogit.data(
data2,
choice="alter4", shape="long",
alt.var="energy_altern",chid.var="id")
# estimate model
nest2 <- mlogit(
alter4 ~ expendmaint + expendnegy |
educ + sex + ppa_power_sp + hu_price_powersupply +
hu_2price +hu_3price + hu_9price + hu_10price +
hu_11price + hu_12price,
data = data2,
nests = list(
Trad = c('Biomas_Trad', 'Solar_Trad'),
modern = c('Biomas_Modern', 'Solar_Modern')
), unscaled=FALSE)
# create Z variable
z3 <- with(data2, data.frame(
expendnegy = tapply(expendnegy, idx(nest2,2), mean),
expendmaint= tapply(expendmaint, idx(nest2,2), mean),
educ= mean(educ),
sex = mean(sex),
hu_price_powersupply = mean(hu_price_powersupply),
ppa_power_sp = mean(ppa_power_sp),
hu_2price = mean(hu_2price),
hu_3price = mean(hu_3price),
hu_9price = mean(hu_9price),
hu_10price = mean(hu_10price),
hu_11price = mean(hu_11price),
ppa_power_sp = mean(ppa_power_sp),
hu_12price = mean(hu_12price)
))
effects(nest2, covariate = "sex", data = z3, type = "ar")
#> ** Error in Solve.default (H, g[!fixed]): Lapack routine dgesv: #> system is exactly singular:U[6,6] =0.**
My data is in long format with expendmaint and expendnegy being the only alternative specific while every other variable is case specific.
altern4 is a nominal variable representing each alternative
I am working with big matrices (size of 30k rows and ~100 columns). I am doing some matrix multiplication and the process would take around 20 seconds. This is my code:
#time begin
result = -1
data = -1
for i=1:size
first_matrix = #view data[i * split,:]
for j=1:size
second_matrix = #view Qg[j * split,:]
matrix_multiplication = first_matrix * second_matrix'
current_sum = sum(matrix_multiplication)
global result
if current_sum > result
result = current_sum
data = matrix_multiplication[1,1]
end
end
end
end
Trying to optimize this a little more, I tried to use multi-threading (julia --thread 4) to get better performance.
#time begin
global result = -1
global data = -1
lock = ReentrantLock()
for i=1:size
first_matrix = #view data[i * split,:]
Threads.#threads for j=1:size
second_matrix = #view Qg[j * split,:]
matrix_multiplication = first_matrix * second_matrix'
current_sum = sum(matrix_multiplication)
global result
if current_sum > result
lock(lock)
result = current_sum
data = matrix_multiplication[1,1]
unlock(lock)
end
end
end
end
By adding multi-threading I thought I would get an increase in performance, but the performance got worse (~40 seconds). I removed the lock to see if that was the issue, but still got the same performance. I am running this on a Dual-Core Intel Core i5 (MacBook pro). Does anyone know why my multi-threading code doesn't work?
I am working on a personal project of mine and was wondering how I can fix my issue.
Here is a piece of code I am working on:
f = open('sample.jpeg','rb')
choice = int(input("-> "))
mutableBytes = bytearray(f.read())
f.close()
print(str(mutableBytes) + "SAMPLE")
if choice == 1:
for i in range(len(mutableBytes)):
if mutableBytes[i] < 255:
mutableBytes[i] = mutableBytes[i] + 1
f.close()
print(str(mutableBytes) + "ENCRYPTED")
f = open('samplecopy.jpeg','wb+')
f.write(mutableBytes)
else:
f = open('samplecopy.jpeg','rb')
mutableBytes = bytearray(f.read())
f.close()
for i in range(len(mutableBytes)):
if mutableBytes[i] > 0 and mutableBytes[i]<255:
mutableBytes[i] = mutableBytes[i] - 1
f = open('samplecopy.jpeg','wb+')
f.write(mutableBytes)
print(str(mutableBytes) + "Decrypted")
This should in theory get a picture and encrypt it, after decrypt it. I printed all the bytes and I looked for changes but it looks the same.
Here is the comparison: https://www.diffchecker.com/vTtzGe4O
And here is the image I get vs the original:
(the bottom one is the one I get after decrypting).
Problem: I am trying to convert a very big list (many millions) of private keys (hexadecimal format, stored in a list of strings) to addresses. Can this be run on the GPU?
I have tried looking for resources on how to adapt my code to a GPU/CUDA-friendly version. However, I've found that most examples online are for pure math operations on a list of ints or floats. Also, the function where the 'processing' is defined is also entirely re-written, and does not use functions from packages (other than those already supported by numpy etc.).
Is there a way to make the [private key -> public key -> address] process GPU-friendly, and can string operations be carried out on a GPU in the first place?
The following is what I have for my serial CPU version for Python3.x:
import codecs
import hashlib
import ecdsa
def get_pub_keys(priv_key):
private_hex = codecs.decode(priv_key, 'hex')
key = ecdsa.SigningKey.from_string(private_hex, curve=ecdsa.SECP256k1).verifying_key
key_bytes = key.to_string()
key_hex = codecs.encode(key_bytes, 'hex')
public_key_uncompressed = b'04' + key_hex
key_string = key_hex.decode('utf-8')
last_byte = int(key_string[-1], 16)
half_len = len(key_hex) // 2
key_half = key_hex[:half_len]
bitcoin_byte = b'02' if last_byte % 2 == 0 else b'03'
public_key_compressed = bitcoin_byte + key_half
return public_key_uncompressed, public_key_compressed
def public_to_address(public_key):
public_key_bytes = codecs.decode(public_key, 'hex')
# Run SHA256 for the public key
sha256_bpk = hashlib.sha256(public_key_bytes)
sha256_bpk_digest = sha256_bpk.digest()
# Run ripemd160 for the SHA256
ripemd160_bpk = hashlib.new('ripemd160')
ripemd160_bpk.update(sha256_bpk_digest)
ripemd160_bpk_digest = ripemd160_bpk.digest()
ripemd160_bpk_hex = codecs.encode(ripemd160_bpk_digest, 'hex')
# Add network byte
network_byte = b'00'
network_bitcoin_public_key = network_byte + ripemd160_bpk_hex
network_bitcoin_public_key_bytes = codecs.decode(network_bitcoin_public_key, 'hex')
# Double SHA256 to get checksum
sha256_nbpk = hashlib.sha256(network_bitcoin_public_key_bytes)
sha256_nbpk_digest = sha256_nbpk.digest()
sha256_2_nbpk = hashlib.sha256(sha256_nbpk_digest)
sha256_2_nbpk_digest = sha256_2_nbpk.digest()
sha256_2_hex = codecs.encode(sha256_2_nbpk_digest, 'hex')
checksum = sha256_2_hex[:8]
# Concatenate public key and checksum to get the address
address_hex = (network_bitcoin_public_key + checksum).decode('utf-8')
wallet = base58(address_hex)
return wallet
def base58(address_hex):
alphabet = '123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz'
b58_string = ''
# Get the number of leading zeros and convert hex to decimal
leading_zeros = len(address_hex) - len(address_hex.lstrip('0'))
# Convert hex to decimal
address_int = int(address_hex, 16)
# Append digits to the start of string
while address_int > 0:
digit = address_int % 58
digit_char = alphabet[digit]
b58_string = digit_char + b58_string
address_int //= 58
# Add '1' for each 2 leading zeros
ones = leading_zeros // 2
for one in range(ones):
b58_string = '1' + b58_string
return b58_string
def get_addresses(i):
key1,key2 = get_pub_keys(i)
add1 = public_to_address(key1)
add2 = public_to_address(key2)
return add1, add2
filename = 'bunchOfHexKeys.txt'
with open(filename, 'r') as f:
hexKeys = f.read().splitlines()
addresses = []
for i in hexKeys:
addresses.append(get_addresses(i))
As can be seen, I'm using many functions from the 3 imported packages. So far the only way I can see is to rewrite those. Is there another way?
The size of hexKeys isn't an issue for the GPU cache size, as I can just adjust the input list as needed.