Grouping categorical variable in dynamic dataframe in pandas - python-3.x

I have categorical variables based on states.
I want to create a dynamic dataframe with same name and data of filtered state only.
Like for DataAL, we will have all data of AL states only.
Code 1:
l = []
for i in a:
print(i)
l[i] = df4[df4["Provider State"].str.contains(i)]
l[i] = pd.DataFrame(l[i])
l[i].head()
TypeError: list indices must be integers or slices, not str
Code 2:
l = []
for i in range(len(a)):
print(i)
l[i] = df4[df4["Provider State"].str.contains(a[i])]
l[i] = pd.DataFrame(l[i])
l[i].head()
IndexError: list assignment index out of range

Related

problem with dictionaries inserted inside a list

I've done a lot of research but I just can't understand it. I'm trying to create a little program that returns all the currencies within a dictionary and then puts them in a list, but when I print it out I get this
l_report = []
dict_report = {}
def maximum_volume():
global dict_report
list_volume = [] #LIST CONTAINING CURRENCY VOLUMES
for elements in currencies:
list_volume.append(elements['quote']['USD']['volume_24h'])
for elements in currencies:
if max(list_volume) == elements['quote']['USD']['volume_24h']:
dict_report ={
'max_volume' : elements['name']
}
l_report.append(dict_report)
def top10():
global dict_report
list_top = [] #LIST CONTAINING THE VALUES OF CURRENCIES WITH A POSITIVE INCREMENT
for elements in currencies:
if elements['quote']['USD']['percent_change_24h'] > 0:
list_top.append(elements['quote']['USD']['percent_change_24h'])
top10_sorted = sorted(list_top, key=lambda x: float(x), reverse=True)
for num in top10_sorted[:10]:
for elements in currencies:
if num == elements['quote']['USD']['percent_change_24h']:
dict_report={
'name' : elements['name'],
'percent': num
}
l_report.append(dict_report)
maximum_volume()
top10()
for k in l_report:
print(k['name'])
KeyError: 'name'

"TypeError: 'DataFrame' objects are mutable, thus they cannot be hashed"

# Print out a nice report
print("\nData Quality Report")
print("Total records: {}".format(len(cust_dmgc.index)))
data_quality_report
output:
I want the above output from User defined function. I tried to to make one... like this:
def DataQuality_Report(df):
# columns
globals() [columns] = pd.DataFrame(list(df.columns))
#Data Types
globals() [data_types] = pd.DataFrame(df.dtypes, columns=['DataType'])
#Missing Values
globals() [missing_data_counts] = pd.DataFrame(df.isnull().sum(), columns= ['Missing Values'])
#Missing Percentage
globals() [missing_data_pct] = pd.DataFrame(((df.isnull().sum() / df.shape[0]) * 100).round(2) , columns= ['Missing Values %'])
#Present Counts
globals() [present_data_counts] = pd.DataFrame(df.count(), columns= ['Present Values'])
#No. of Unique Values
globals() [unique_value_counts] = pd.DataFrame(columns= ['Unique Values'])
for v in list(df.columns.values):
unique_value_counts.loc[v] = [df[v].nunique()]
#Minimum Values for numeric data
globals() [minimum_values] = pd.DataFrame(columns= ['Minimum Value'])
for v in list(df.columns.values):
if df[v].dtype != 'O':
minimum_values.loc[v] = [df[v].min()]
else:
minimum_values.loc[v] = "-"
#Maximum Values for numeric data
globals() [maximum_values] = pd.DataFrame(columns= ['Maximum Value'])
for v in list(df.columns.values):
if df[v].dtype != 'O':
maximum_values.loc[v] = [df[v].max()]
else:
maximum_values.loc[v] = "-"
# Merge all the dataframes together by the index
globals() [data_quality_report] = data_types.join(present_data_counts).join(missing_data_counts)\ .join(missing_data_pct).join(unique_value_counts).join(minimum_values).join(maximum_values)
#Output
print("\nData Quality Report")
print("Total records: {}".format(len(df.index)))
data_quality_report
And this is giving me an error 'DataFrame' objects are mutable, thus they cannot be hashed.
I used globals() because I want those individual data frames to use outside that function.
Please guide me on how to fix this? That would be very helpful.

Why every value I am putting into List using Insert keyword has a data type String

"""
Why every value I am putting into List using Insert keyword has a data type String. Also inserted vale index -1 but the list is not arranged like that.
Please go through the Image Description
"""
enter image description here
lis = []
def takeinput():
x = int(input("Enter how many element you want to put into list"))
while x != 0:
lis.insert(-1,input())
x = x-1
return lis
takeinput()
type(lis[0])
insert doesn't change the datatype of what it inserts into the list. Integer remains integer, strings remain strings:
>>> lis = [1,2,3,4]
>>> lis.insert(-1, 5)
>>> lis
[1, 2, 3, 5, 4]
As you see above, your new value is at the second last index of the list.
I think that is unwanted.
Normally you would use append instead of insert to fill an array.
lis = []
new_var = 'foo'
if new_var not in lis:
lis.append(new_var)
# ['foo']
You store strings into the list because with
lis.insert(-1,input())
you insert the string of input() at the last element of your list.
input() ever gives a string.
If you want to have another datatype stored, you need to cast your input into this datatype before storing into your list.
lis = []
def takeinput():
x = int(input("Enter how many element you want to put into list"))
while x != 0:
new_item = input() # store input string
new_item = int(new_item) # cast it to int for example
lis.append(new_item) # append it to the list
x = x-1
return lis
takeinput()
type(lis[0])
In python you can store everything into lists.

I want to make a dictionary of trigrams out of a text file, but something is wrong and I do not know what it is

I have written a program which is counting trigrams that occur 5 times or more in a text file. The trigrams should be printed out according to their frequency.
I cannot find the problem!
I get the following error message:
list index out of range
I have tried to make the range bigger but that did not work out
f = open("bsp_file.txt", encoding="utf-8")
text = f.read()
f.close()
words = []
for word in text.split():
word = word.strip(",.:;-?!-–—_ ")
if len(word) != 0:
words.append(word)
trigrams = {}
for i in range(len(words)):
word = words[i]
nextword = words[i + 1]
nextnextword = words[i + 2]
key = (word, nextword, nextnextword)
trigrams[key] = trigrams.get(key, 0) + 1
l = list(trigrams.items())
l.sort(key=lambda x: x[1])
l.reverse()
for key, count in l:
if count < 5:
break
word = key[0]
nextword = key[1]
nextnextword = key[2]
print(word, nextword, nextnextword, count)
The result should look like this:(simplified)
s = "this is a trigram which is an example............."
this is a
is a trigram
a trigram which
trigram which is
which is an
is an example
As the comments pointed out, you're iterating over your list words with i, and you try to access words[i+1], when i will reach the last cell of words, i+1 will be out of range.
I suggest you read this tutorial to generate n-grams with pure python: http://www.albertauyeung.com/post/generating-ngrams-python/
Answer
If you don't have much time to read it all here's the function I recommend adaptated from the link:
def get_ngrams_count(words, n):
# generates a list of Tuples representing all n-grams
ngrams_tuple = zip(*[words[i:] for i in range(n)])
# turn the list into a dictionary with the counts of all ngrams
ngrams_count = {}
for ngram in ngrams_tuple:
if ngram not in ngrams_count:
ngrams_count[ngram] = 0
ngrams_count[ngram] += 1
return ngrams_count
trigrams = get_ngrams_count(words, 3)
Please note that you can make this function a lot simpler by using a Counter (which subclasses dict, so it will be compatible with your code) :
from collections import Counter
def get_ngrams_count(words, n):
# turn the list into a dictionary with the counts of all ngrams
return Counter(zip(*[words[i:] for i in range(n)]))
trigrams = get_ngrams_count(words, 3)
Side Notes
You can use the bool argument reverse in .sort() to sort your list from most common to least common:
l = list(trigrams.items())
l.sort(key=lambda x: x[1], reverse=True)
this is a tad faster than sorting your list in ascending order and then reverse it with .reverse()
A more generic function for the printing of your sorted list (will work for any n-grams and not just tri-grams):
for ngram, count in l:
if count < 5:
break
# " ".join(ngram) will combine all elements of ngram in a string, separated with spaces
print(" ".join(ngram), count)

how to update contents of file in python

def update():
global mylist
i = j = 0
mylist[:]= []
key = input("enter student's tp")
myf = open("data.txt","r+")
ml = myf.readlines()
#print(ml[1])
for line in ml:
words = line.split()
mylist.append(words)
print(mylist)
l = len(mylist)
w = len(words)
print(w)
print(l)
for i in range(l):
for j in range(w):
print(mylist[i][j])
## if(key == mylist[i][j]):
## print("found at ",i,j)
## del mylist[i][j]
## mylist[i].insert((j+1), "xxx")
below is the error
print(mylist[i][j])
IndexError: list index out of range
I am trying to update contents in a file. I am saving the file in a list as lines and each line is then saved as another list of words. So "mylist" is a 2D list but it is giving me error with index
Your l variable is the length of the last line list. Others could be shorter.
A better idiom is to use a for loop to iterate over a list.
But there is an even better way.
It appears you want to replace a "tp" (whatever that is) with the string xxx everywhere. A quicker way to do that would be to use regular expressions.
import re
with open('data.txt') as myf:
myd = myf.read()
newd = re.sub(key, 'xxx', myd)
with open('newdata.txt', 'w') ad newf:
newf.write(newd)

Resources