Is there any way to generate random alphabets in python. I've come across a code where it is possible to generate random alphabets from a-z.
For instance, the below code generates the following output.
import pandas as pd
import numpy as np
import string
ran1 = np.random.random(5)
print(random)
[0.79842166 0.9632492 0.78434385 0.29819737 0.98211011]
ran2 = string.ascii_lowercase
'abcdefghijklmnopqrstuvwxyz'
However, I want to generate random letters with input as the number of random letters (example 3) and the desired output as [a, f, c]. Thanks in advance.
Convert the string of letters to a list and then use numpy.random.choice. You'll get an array back, but you can make that a list if you need.
import numpy as np
import string
np.random.seed(123)
list(np.random.choice(list(string.ascii_lowercase), 10))
#['n', 'c', 'c', 'g', 'r', 't', 'k', 'z', 'w', 'b']
As you can see, the default behavior is to sample with replacement. You can change that behavior if needed by adding the parameter replace=False.
Here is an idea modified from https://pythontips.com/2013/07/28/generating-a-random-string/
import string
import random
def random_generator(size=6, chars=string.ascii_lowercase):
return ''.join(random.choice(chars) for x in range(size))
Related
I'd like to read a column from a CSV file and store those values in a list
The CSV file is currently as below
Names
Tom
Ryan
John
The result that I'm looking for is
['Tom', 'Ryan', 'John']
Below is the code that I've written.
import csv
import pandas as pd
import time
# Declarations
UserNames = []
# Open a csv file using pandas
data_frame = pd.read_csv("analysts.csv", header=1, index_col=False)
names = data_frame.to_string(index=False)
# print(names)
# Iteration
for name in names:
UserNames.append(name)
print(UserNames)
So far the result is as follows
['T', 'o', 'm', ' ', '\n', 'R', 'y', 'a', 'n', '\n', 'J', 'o', 'h', 'n']
Any help would be appreciated.
Thanks in advance
Hi instead of using converting your Dataframe to a String you could just convert it to a list like this:
import pandas as pd
import csv
import time
df = pd.read_csv("analyst.csv", header=0)
names = df["Name"].to_list()
print(names)
Output: ['tom', 'tim', 'bob']
Csv File:
Name,
tom,
tim,
bob,
I was not sure how your csv really looked like so you could have to adjust the arguments of the read_csv function.
I have the following pandas DataFrame from this CSV file
import pandas as pd
df=pd.read_csv('Last_year.csv')
df.groupby('School Status').size().plot(kind='pie', autopct='%1.1f%%')
I would like to know how I can remove the error which is causing me to have 3 divisions and not 2 as programmed
Here is the result
Seems like you're having these three dimensions in your DataFrame.
df['School Status'].unique()
array(['IN SCHOOL', 'OUT OF SCHOOL', 'OUT OF SCHOOL '], dtype=object)
So, if you'll remove whitespace after the last one, it should work properly:
Try this snippet:
import pandas as pd
df=pd.read_csv('Last_year.csv')
df['School Status'] = df['School Status'].replace({'OUT OF SCHOOL ': 'OUT OF SCHOOL'})
df.groupby('School Status').size().plot(kind='pie', autopct='%1.1f%%')
You may have more spaces in one of the labels.
To check what are the labels that pandas detects, you can use Series.unique().
You can remove whitespace from beginning and end for example by using str.strip element-wise, doing:
import pandas as pd # This code is from OP
df=pd.read_csv('Last_year.csv') # This code is from OP
df['School Status'] = df['School Status'].map(str.strip)
df.groupby('School Status').size().plot(kind='pie', autopct='%1.1f%%') # This code is from OP
# Now you can plot your DF
I have 10 txt files. Each of them with strings.
A.txt: "This is a cat"
B.txt: "This is a dog"
.
.
J.txt: "This is an ant"
I want to read these multiple files and put it in 2D array.
[['This', 'is', 'a', 'cat'],['This', 'is', 'a', 'dog']....['This', 'is', 'an', 'ant']]
from glob import glob
import numpy as np
for filename in glob('*.txt'):
with open(filename) as f:
data = np.genfromtxt(filename, dtype=str)
It's not working the way I want. Any help will be greatly appreciated.
You are just generating different numpy arrays for each text file and not saving any of them. How about add each file to a list like so and convert to numpy later?
data = []
for filename in glob('*.txt'):
with open(filename) as f:
data.append(f.read().split())
data = np.array(data)
I am getting the following error when calling the inverse_transform of LabelEncoder:
Traceback (most recent call last):
File "Test.py", line 31, in <module>
inverted = label_encoder.inverse_transform(integer_encoded['DEST'])
File "...\Python\Python36\lib\site-packages\sklearn\preprocessing\label.py", line 283, in inverse_transform
return self.classes_[y]
TypeError: only integer scalar arrays can be converted to a scalar index
The code that generates this error is the following:
import pandas as pd
import numpy as np
from collections import defaultdict
from sklearn import preprocessing
import bisect
data_cat = {'ORG': ['A', 'B', 'C', 'D'],
'DEST': ['A', 'E', 'F', 'G'],
'OP': ['F1', 'F1', 'F1', 'F2']}
data_cat = pd.DataFrame(data_cat)
#retain all columns LabelEncoder as dictionary.
label_encoder_dict = defaultdict(preprocessing.LabelEncoder)
integer_encoded = data_cat.apply(lambda x: label_encoder_dict[x.name].fit_transform(x))
print("Integer encoded: ")
print(integer_encoded)
#add a UNK class that will be used for the unseen values from the test dataset
for key, le in label_encoder_dict.items():
le_classes = np.array(le.classes_).tolist()
bisect.insort_left(le_classes, 'UNK')
le.classes_ = le_classes
label_encoder = label_encoder_dict['DEST']
print(label_encoder.classes_)
print(integer_encoded['DEST'])
print(type (integer_encoded['DEST']))
inverted = label_encoder.inverse_transform(integer_encoded['DEST'])
print(inverted)
If I remove the for loop that adds the UNK class to every LabelEncoder, everything is working fine. I don't understand why adding a new class impacts the call of the inverse_transform.
Thanks for any help or guidance.
LabelEncoder.inverse_transform is actually quite simple. The LabelEncoder object stores an array of original values in the classes_ attribute, and the encoded integer is the index of that value in classes_. Normally, classes_ is an np.array type which supports passing a list of indices to get the values at those indices. However, in your for loop you converted that to a regular old python list, which does not support that behavior.
If you change your for loop to keep le.classes_ as an ndarray, it should work:
for key, le in label_encoder_dict.items():
le_classes = np.array(le.classes_).tolist()
bisect.insort_left(le_classes, 'UNK')
le.classes_ = np.asarray(le_classes)
One windows 10, with versions:
Python 3.5.2, pandas 0.23.4, matplotlib 3.0.0, numpy 1.15.2,
the following code give me the following warning that i would like to sort out
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.cm as cm
# a 5x4 random pandas DataFrame
pf = pd.DataFrame(np.random.random((5,4)), columns=['a', 'b', 'c', 'd'])
# colors:
colors = cm.rainbow(np.linspace(0, 1, 4))
fig1 = pf.plot.scatter('a', 'b', color='k')
for i, j in enumerate(['b', 'c', 'd']):
pf.plot.scatter('a', j, color=colors[i+1], ax = fig1)
And I get a warning:
'c' argument looks like a single numeric RGB or RGBA sequence, which
should be avoided as value-mapping will have precedence in case its
length matches with 'x' & 'y'. Please use a 2-D array with a single
row if you really want to specify the same RGB or RGBA value for all
points.
Could you point me on how to address that warning?
I can't reproduce the warning with matplotlib 3.0 and pandas 0.23.4, but what it says is essentially that you should not use a single RGB tuple to specify a color.
So instead of color=colors[i+1] use
color=[colors[i+1]]