Task : Find unique elements in an array. Count their occurrences. Find the numbers that occur less than 10 times in an array of 5000 elements - python-3.x

I tried a few solutions :
1.)
uniqueValues, indexList,occurCount = np.unique(desired_array,
return_index=True, return_counts=True)
print(uniqueValues,indexList,occurCount)
However the indexList only gives first occurrence of a number. For example : if num 33 occurred at 20,56,3000, indexList would only show that it occurred at 20. Since 33 occurs less than 10 times, i.e 3 times, I need all the locations.
2.) I decided to use dictionary to find all the index locations. But this is not working.
for i in range(5000):
...: if not d.get(i):
...: d[desired_array[i]]=[i]
...: else:
...: indices = d[desired_array[i]]
...: indices.append(i)

This jobs screams for collections.Counter:
from collections import Counter
desired_array = [1, 2, 3, 1, 3, 5, 3]
result = Counter(desired_array)
print(result)
This will print out the unique elements and the count of occurrences:
Counter({3: 3, 1: 2, 2: 1, 5: 1})
You can replace
for i in range(1250):
var = desired_array[i]
if not d.get(var):
d[var] = []
# print(var)
s = d[var]
s.append(i)
with
for i in range(1250):
var = desired_array[i]
d.setdefault(var, []).append(i)
According to the documentation dict.setdefault(key, default):
If key is in the dictionary, return its value. If not, insert key with a value of default and return default. default defaults to None.
To write a csv file it's best to use the standard csv.Writer class:
import csv
with open('some.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(someiterable)
If you want to write the key/value pairs of your dict to the csv file you need to write something like:
with open('some.csv', 'w', newline='') as f:
writer = csv.writer(f)
for k, v in desired_array.items():
writer.writerow((k, v))

Related

column comprehension robust to missing values

I have only been able to create a two column data frame from a defaultdict (termed output):
df_mydata = pd.DataFrame([(k, v) for k, v in output.items()],
columns=['id', 'value'])
What I would like to be able to do is using this basic format also initiate the dataframe with three columns: 'id', 'id2' and 'value'. I have a separate defined dict that contains the necessary look up info, called id_lookup.
So I tried:
df_mydata = pd.DataFrame([(k, id_lookup[k], v) for k, v in output.items()],
columns=['id', 'id2','value'])
I think I'm doing it right, but I get key errors. I will only know if id_lookup is exhaustive for all possible encounters in hindsight. For my purposes, simply putting it all together and placing 'N/A` or something for those types of errors will be acceptable.
Would the above be appropriate for calculating a new column of data using a defaultdict and a simple lookup dict, and how might I make it robust to key errors?
Here is an example of how you could do this:
import pandas as pd
from collections import defaultdict
df = pd.DataFrame({'id': [1, 2, 3, 4],
'value': [10, 20, 30, 40]})
id_lookup = {1: 'A', 2: 'B', 3: 'C'}
new_column = defaultdict(str)
# Loop through the df and populate the defaultdict
for index, row in df.iterrows():
try:
new_column[index] = id_lookup[row['id']]
except KeyError:
new_column[index] = 'N/A'
# Convert the defaultdict to a Series and add it as a new column in the df
df['id2'] = pd.Series(new_column)
# Print the updated DataFrame
print(df)
which gives:
id value id2
0 1 10 A
1 2 20 B
2 3 30 C
3 4 40 N/A
​

Reading input as dictionary in python

Im trying to read input spread across multiline in form of dictionary and apply simple math operations on the value of dictionary . My code reads
d ={}
bal=0
text = input().split(",") #split the input text based on line'text'
print(text)
for i in range(5):
text1 = text[i].split(" ") #split the input text based on space & store in the list 'text1'
d[text1[0]] = int(text1[1]) #assign the 1st item to key and 2nd item to value of the dictionary
print(d)
for key in d:
if key=='D':
bal=bal+int(d[key])
#print(d[key])
elif key=='W':
bal=bal-int(d[key])
print(bal)
Input : W 300,W 200,D 100,D 400,D 600
output :{'D': 600, 'W': 200}
400
Expected Output: {'W':300,'W':200,'D':100,'D':400,'D':600}
600
ISSUE: The issue here is the code always reads 2 and last values only . For example in the above case output is
{'D': 600, 'W': 200}
400
Can someone let me know the issue with for loop .
Thanks in advance
You can try like this in a simpler way using your own approach. #Rakesh and #Sabesh suggested good. Dictionary is an unordered collection with unique and immutable keys. You can easily check this on your Python interactive console by executing help(dict).
You can check https://docs.python.org/2/library/collections.html#collections.defaultdict . Here you'll find number of examples on how to efficiently using dictionary.
>>> d = {}
>>> text = 'W 300,W 200,D 100,D 400,D 600'
>>>
>>> for item in text.split(","):
... arr = item.split()
... d.setdefault(arr[0], []).append(arr[1])
...
>>> d
{'W': ['300', '200'], 'D': ['100', '400', '600']}
>>>
>>> w = [int(n) for n in d['W']]
>>> d = [int(n) for n in d['D']]
>>>
>>> bal = sum(d) - sum(w)
>>> bal
600
>>>

Python Pandas Create Multiple dataframes by slicing data at certain locations

I am new to Python and data analysis using programming. I have a long csv and I would like to create DataFrame dynamically and plot them later on. Here is an example of the DataFrame similar to the data exist in my csv file
df = pd.DataFrame(
{"a" : [4 ,5, 6, 'a', 1, 2, 'a', 4, 5, 'a'],
"b" : [7, 8, 9, 'b', 0.1, 0.2, 'b', 0.3, 0.4, 'b'],
"c" : [10, 11, 12, 'c', 10, 20, 'c', 30, 40, 'c']})
As seen, there are elements which repeated in each column. So I would first need to find the index of the repetition and following that use this for making subsets. Here is the way I did this.
find_Repeat = df.groupby(['a'], group_keys=False).apply(lambda df: df if
df.shape[0] > 1 else None)
repeat_idxs = find_Repeat.index[find_Repeat['a'] == 'a'].tolist()
If I print repeat_idxs, I would get
[3, 6, 9]
And this is the example of what I want to achieve in the end
dfa_1 = df['a'][Index_Identifier[0], Index_Identifier[1])
dfa_2 = df['a'][Index_Identifier[1], Index_Identifier[2])
dfb_1 = df['b'][Index_Identifier[0], Index_Identifier[1])
dfb_2 = df['b'][Index_Identifier[1], Index_Identifier[2])
But this is not efficient and convenient as I need to create many DataFrame like these for plotting later on. So I tried the following method
dfNames = ['dfa_' + str(i) for i in range(len(repeat_idxs))]
dfs = dict()
for i, row in enumerate(repeat_idxs):
dfName = dfNames[i]
slices = df['a'].loc[row:row+1]
dfs[dfName] = slices
If I print dfs, this is exactly what I want.
{'df_0': 3 a
4 1
Name: a, dtype: object, 'df_1': 6 a
7 4
Name: a, dtype: object, 'df_2': 9 a
Name: a, dtype: object}
However, if I want to read my csv and apply the above, I am not getting what's desired. I can find the repeated indices from csv file but I am not able to slice the data properly. I am presuming that I am not reading csv file correctly. I attached the csv file for further clarification csv file
Two options:
Loop over and slice
Detect the repeat row indices and then loop over to slice contiguous chunks of the dataframe, ignoring the repeat rows:
# detect rows for which all values are equal to the column names
repeat_idxs = df.index[(df == df.columns.values).all(axis=1)]
slices = []
start = 0
for i in repeat_idxs:
slices.append(df.loc[start:i - 1])
start = i + 1
The result is a list of dataframes slices, which are the slices of your data in order.
Use pandas groupby
You could also do this in one line using pandas groupby if you prefer:
grouped = df[~(df == df.columns.values).all(axis=1)].groupby((df == df.columns.values).all(axis=1).cumsum())
And you can now iterate over the groups like so:
for i, group_df in grouped:
# do something with group_df

how to print only duplicate numbers in a list?

I need to print only duplicate numbers in a list and need to multiply by count. the code is as follows , the output should be ,
{1:3, 2:2, 3:2} need to multiply each numbers by count and print as separate answers:
answer1 = 1*3, answer2 = 2*2 , answer3 = 3*2
Current attempt:
from collections import Counter
alist = [1,2,3,5,1,2,1,3,1,2]
a = dict(Counter(a_list))
print(a)
Counter already does the heavy lifting. So for the rest, what about generating a list of the values occuring more than once, formatting the output as you wish ? (sorting the keys seems necessary so indexes match the keys order):
from collections import Counter
a_list = [1,2,3,5,1,2,1,3,1,2]
a = ["{}*{}".format(k,v) for k,v in sorted(Counter(a_list).items()) if v > 1]
print(a)
result:
['1*4', '2*3', '3*2']
If you want the numerical result instead:
a = [k*v for k,v in sorted(Counter(a_list).items()) if v > 1]
result (probably more useful):
[4, 6, 6]
Assigning to separate variables (answer1,answer2,answer3 = a) is not a very good idea. Keep a indexed list

Counter class extension

I am having a problem finding an elegant way to create a Counter() class that can:
Feed in arbitrary number of keys and return a nested dictionary based on this list of keys.
Increment for this nested dictionary is arbitrary as well.
For example:
counter = Counter()
for line in fin:
if a:
counter.incr(key1, 1)
else:
counter.incr(key2, key3, 2)
print counter
Ideally I am hoping to get the result looks like: {key1 : 20, {key2 : {key3 : 40}}}. But I am stuck in creating this arbitrary nested dictionary from list of keys. Any help is appreciated.
you can subclass dict and create your own nested structure.
here's my attempt at writing such class :
class Counter(dict):
def incr(self, *args):
if len(args) < 2:
raise TypeError, "incr() takes at least 2 arguments (%d given)" %len(args)
curr = self
keys, count = args[:-1], args[-1]
for depth, key in enumerate(keys, 1):
if depth == len(keys):
curr[key] = curr.setdefault(key, 0) + count
else:
curr = curr.setdefault(key, {})
counter = Counter()
counter.incr('key1', 1)
counter.incr('key2', 'key3', 2)
counter.incr('key1', 7)
print counter #{'key2': {'key3': 2}, 'key1': 8}
There are two possibilities.
First, you can always fake the nested-keys thing by using a flat Counter with a "key path" made of tuples:
counter = Counter()
for line in fin:
if a:
counter.incr((key1,), 1)
else:
counter.incr((key2, key3), 2)
But then you'll need to write a str-replacement—or, better, a wrapper class that implements __str__. And while you're at it, you can easily write an incr wrapper that lets you use exactly the API you wanted:
def incr(self, *args):
super().incr(args[:-1], args[-1])
Alternatively, you can build your own Counter-like class on top of a nested dict. The code for Counter is written in pure Python, and the source is pretty simple and readable.
From, your code, it looks like you don't have any need to access things like counter[key2][key3] anywhere, which means the first is probably going to be simpler and more appropriate.
The only type of value that can exist in a Counter object is an int, you will not be able to represent a nested dictionary with a Counter.
Here is one way to do this with a normal dictionary (counter = {}). First, to update increment the value for a single key:
counter[key1] = counter.setdefault(key1, 0) + 1
Or for an arbitrary list of keys to create the nested structure:
tmp = counter
for key in key_list[:-1]:
tmp = tmp.setdefault(key, {})
tmp[key_list[-1]] = tmp.setdefault(key_list[-1], 0) + 1
I would probably turn this into the following function:
def incr(counter, val, *keys):
tmp = counter
for key in keys[:-1]:
tmp = tmp.setdefault(key, {})
tmp[keys[-1]] = tmp.setdefault(keys[-1], 0) + val
Example:
>>> counter = {}
>>> incr(counter, 1, 'a')
>>> counter
{'a': 1}
>>> incr(counter, 2, 'a')
>>> counter
{'a': 3}
>>> incr(counter, 2, 'b', 'c', 'd')
>>> counter
{'a': 3, 'b': {'c': {'d': 2}}}
>>> incr(counter, 3, 'b', 'c', 'd')
>>> counter
{'a': 3, 'b': {'c': {'d': 5}}}

Resources