I'm looking for a method to generate sequences from every-grams up to length n that match an input sentence:
Given a sentence: "Break this into sequences" and n = 3
I want to create the sequences:
("Break", "this", "into", "sequences")
("Break", "this", "into sequences")
("Break", "this into", "sequences")
("Break this", "into", "sequences")
("Break this", "into sequences")
("Break", "this into sequences")
("Break this into", "sequences")
nltk has the everygram package, but I'm not quite sure how I'd use it toward my goal.
I've tried adapting the problem to focus on characters for simplicity, i.e.,
It may be helpful to consider these as character-grams (and, as rici suggested, spacing out characters [with and without spacing shown for clarity]):
abcd goes to:
(a, b, c, d) (a, b, c, d)
(a, b, c d) (a, b, cd)
(a, b c, d) (a, bc, d)
(a b, c, d) (ab, c, d)
(a b, c d) (ab, cd)
(a, b c d) (a, bcd)
(a b c, d) (abc, d)
For clarity, this should generalize for any length, given a n as the maximum-sized n-gram; so, for abcde with n=3 we'd have:
(a, b, c, d, e) (a, b, c, d, e)
(a, b, c, d e) (a, b, c, de)
(a, b, c d, e) (a, b, cd, e)
(a, b c, d e) (a, bc, d, e)
(a b, c, d, e) (ab, c, d, e)
(a, b c, d e) (a, bc, de)
(a b, c, d e) (ab, c, de)
(a b, c d, e) (ab, cd, e)
(a, b, c d e) (a, b, cde)
(a, b c d, e) (a, bcd, e)
(a b c, d, e) (abc, d, e)
(a b, c d e) (ab, cde)
(a b c, d e) (abc, de)
I'm thinking I may need to generate a grammar, something like:
exp ::= ABC, d | a, BCD
ABC ::= AB, c | A, BC
BCD ::= BC, d | b, CD
AB ::= A, b | a, B
BC ::= B, c | b, C
CD ::= C, d | c, D
A ::= a
B ::= b
C ::= c
D ::= d
and find all parses of the sentence, but certainly there must be a procedural way to go about this?
Maybe it would be helpful to space your example out a bit:
(a , b , c , d)
(a , b , c d)
(a , b c , d)
(a b , c , d)
(a b , c d)
(a , b c d)
(a b c , d)
(a b c d) # added for completeness
Looking at that, it's evident that what differentiates the rows is the presence or absence of commas, a typical binary choice. There are three places a comma could go, so there are eight possibilities, corresponding to the eight binary numbers of three digits.
The easiest way to list these possibilities is to count from 0 0 0 to 1 1 1.
For your modified question, in which there is a maximum length of a part, one simple recursive solution in Python is:
def kgram(k, v):
'Generate all partitions of v with parts no larger than k'
def helper(sfx, m):
if m == 0: yield sfx
else:
for i in range(1, min(k, m)+1):
yield from helper([v[m-i:m]]+sfx, m-i)
yield from helper([], len(v))
Here's a quick test:
>>> for p in gram(3, 'one two three four five'.split()): print(p)
...
[['one'], ['two'], ['three'], ['four'], ['five']]
[['one', 'two'], ['three'], ['four'], ['five']]
[['one'], ['two', 'three'], ['four'], ['five']]
[['one', 'two', 'three'], ['four'], ['five']]
[['one'], ['two'], ['three', 'four'], ['five']]
[['one', 'two'], ['three', 'four'], ['five']]
[['one'], ['two', 'three', 'four'], ['five']]
[['one'], ['two'], ['three'], ['four', 'five']]
[['one', 'two'], ['three'], ['four', 'five']]
[['one'], ['two', 'three'], ['four', 'five']]
[['one', 'two', 'three'], ['four', 'five']]
[['one'], ['two'], ['three', 'four', 'five']]
[['one', 'two'], ['three', 'four', 'five']]
Related
I have a dataframe column that looks like this (roughly 200 rows):
col1
a
b
c
d
e
f
I want to create a new dataframe with one column and 15 sets of 3 random combinations of the items in the pandas column. for example:
new_df
combinations:
(a,b,c)
(a,c,d)
(a,d,c)
(b,a,d)
(d,a,c)
(a,d,f)
(e,a,f)
(a,f,e)
(b,e,f)
(f,b,e)
(c,b,e)
(b,e,a)
(a,e,f)
(e,f,a)
Currently the code I have creates a combination of every possible combination and runs out of memory when I try to append the results to another dataframe:
import pandas as pd
from itertools import permutations
df = pd.read_csv('')
combo = df['col1'].tolist()
perm = permutations(combo,3)
combinations = pd.DataFrame(columns=['combinations'])
list_ = []
for i in list(perm):
combinations['combinations'] = i
list_.append(i)
How do I stop the sets of random combinations to stop at any X number of set or in this case 15 combinations of 3?
The reason your code runs out of memory is specifically because of the part where you call list(perm). doing this will generate EVERY permutation possible. So when you do
for i in list(perm):
...
You're telling python to create a list of all permutations, then try to iterate over that list. Instead, if you iterate over the generator that calling permutations creates (e.g. for i in perm: instead of for i in list(perm):), you can simply iterate over each permutation without storing them all into memory at once. So if you break your for loop after it loops 15 times, you can achieve your desired result.
However, since we're using itertools, we can vastly simplify that logic using islice to do the work of getting the first 15 without explicitly writing a for-loop and breaking at the 15th iteration:
import pandas as pd
from itertools import permutations, islice
# df = pd.read_csv('')
# combo = df['col1'].tolist()
combo = list("abcefg")
perm_generator = permutations(combo,3)
# get first 15 permutations without running the generator
first_15_perms = islice(perm_generator, 15)
# Store the first 15 permutations into a Series object
series_perms = pd.Series(list(first_15_perms), name="permutations")
print(series_perms)
0 (a, b, c)
1 (a, b, e)
2 (a, b, f)
3 (a, b, g)
4 (a, c, b)
5 (a, c, e)
6 (a, c, f)
7 (a, c, g)
8 (a, e, b)
9 (a, e, c)
10 (a, e, f)
11 (a, e, g)
12 (a, f, b)
13 (a, f, c)
14 (a, f, e)
Name: permutations, dtype: object
If you want this as a single column in a DataFrame you can use the to_frame() method:
df_perms = series_perms.to_frame()
print(df_perms)
permutations
0 (a, b, c)
1 (a, b, e)
2 (a, b, f)
3 (a, b, g)
4 (a, c, b)
5 (a, c, e)
6 (a, c, f)
7 (a, c, g)
8 (a, e, b)
9 (a, e, c)
10 (a, e, f)
11 (a, e, g)
12 (a, f, b)
13 (a, f, c)
14 (a, f, e)
While not quite as elegant as the previous answers, If you truly want to create a random sampling of values, not just the first you could also do something along the lines of the following:
def newFrame(df: pd.DataFrame, srccol: int, cmbs: int, rows: int) -> pd.DataFrame:
il = df[srccol].values.tolist()
nw_df = pd.DataFrame()
data = []
for r in range(rows):
rd =[]
for ri in range(cmbs):
rd.append(rnd.choice(il))
data.append(tuple(rd))
nw_df['Combinations'] = data
return nw_df
Which when passed a a df as shown in your example in the form of:
new_df = newFrame(df, 0, 3, 15)
Produces:
Combinations
0 (a, f, e)
1 (a, d, f)
2 (b, c, d)
3 (a, a, d)
4 (f, b, c)
5 (e, b, b)
6 (e, e, d)
7 (c, f, f)
8 (f, e, b)
9 (d, c, e)
I am using the Tau Prolog library in my project and it has been working fine till I tried this.
I am attempting to output multiple schedules and their corresponding extra hours (the context is not relevent here).
Here is the output in the console when I use session.answers( x => console.log( pl.format_answer(x) ) );
// Query the goal
session.query("getSchedules(123,Schedule,ExtraHours).");
// Show answers
session.answers(x => console.log(pl.format_answer(x)));
CONSOLE:
Schedule = [a, b, c, d, e, f, m, g], ExtraHours = 0 ;
Schedule = [a, b, c, d, e, f, j, k], ExtraHours = 0 ;
Schedule = [a, b, c, d, e, f, j, x], ExtraHours = 0 ;
Schedule = [a, b, c, d, e, f, j, g], ExtraHours = -2 ;
Schedule = [a, b, c, d, e, f, k, x], ExtraHours = 0 ;
Schedule = [a, b, c, d, e, f, k, g], ExtraHours = -2 ;
Schedule = [a, b, c, d, e, f, x, g], ExtraHours = -2 ;
false.
I am trying to get all these schedules in 1 list but whenever I try to accumulate the results in a global variable, the resulting list only has the 1st schedule. Does anyone know how I can get a list with all the schedules?
how does this script works and why the variable b get 50 as its value and not 1
a = 1
b = 50
b, b = a, b
print(b)
actual result: 50
b, b = a, b is actually a tuple assignment, and it works from left to right.
b, b = a, b evaluates to (b, b) = (1, 50) which in turn is executed as
b = 1
b = 50
Real beginners question here. How do I represent a problem with multiple hypotheses in Lean? For example:
Given
A
A→B
A→C
B→D
C→D
Prove the proposition D.
(Problem taken from The Incredible Proof Machine, Session 2 problem 3. I was actually reading Logic and Proof, Chapter 4, Propositional Logic in Lean but there are less exercises available there)
Obviously this is completely trivial to prove by applying modus ponens twice, my question is how do I represent the problem in the first place?! Here's my proof:
variables A B C D : Prop
example : (( A )
/\ ( A->B )
/\ ( A->C )
/\ ( B->D )
/\ ( C->D ))
-> D :=
assume h,
have given1: A, from and.left h,
have given2: A -> B, from and.left (and.right h),
have given3: A -> C, from and.left (and.right (and.right h)),
have given4: B -> D, from and.left (and.right (and.right (and.right h))),
have given5: C -> D, from and.right (and.right (and.right (and.right h))),
show D, from given4 (given2 given1)
Try it!
I think I've made far too much a meal of packaging up the problem then unpacking it, could someone show me a better way of representing this problem please?
I think it is a lot clearer by not using And in the hypotheses instead using ->. here are 2 equivalent proofs, I prefer the first
def s2p3 {A B C D : Prop} (ha : A)
(hab : A -> B) (hac : A -> C)
(hbd : B -> D) (hcd : C -> D) : D
:= show D, from (hbd (hab ha))
The second is the same as the first except using example,
I believe you have to specify the names of the parameters using assume
rather than inside the declaration
example : A -> (A -> B) -> (A -> C) -> (B -> D) -> (C -> D) -> D :=
assume ha : A,
assume hab : A -> B,
assume hac, -- You can actually just leave the types off the above 2
assume hbd,
assume hcd,
show D, from (hbd (hab ha))
if you want to use the def syntax but the problem is e.g. specified using example syntax
example : A -> (A -> B) -> (A -> C)
-> (B -> D) -> (C -> D) -> D := s2p3
Also, when using and in your proof, in the unpacking stage
You unpack given3, and given 5 but never use them in your "show" proof.
So you don't need to unpack them e.g.
example : (( A )
/\ ( A->B )
/\ ( A->C )
/\ ( B->D )
/\ ( C->D ))
-> D :=
assume h,
have given1: A, from and.left h,
have given2: A -> B, from and.left (and.right h),
have given4: B -> D, from and.left (and.right (and.right (and.right h))),
show D, from given4 (given2 given1)
My count if function won't work for the letter "C". I checked for spaces with len function and I am super confused. Thanks for the help.
#of Accident Type
A 28
B 19
C =COUNTIF(A2:A101, "*C*")
D 17
E 9
F 9
Accidents
A
B
D
A
A
F
C
A
C
B
E
B
A
C
F
D
B
C
D
A
A
C
B
E
B
C
E
A
B
A
A
A
B
C
C
D
F
D
B
B
A
F
C
B
A
C
B
E
E
D
A
B
C
E
A
A
F
C
B
D
D
D
B
D
C
A
F
A
A
B
D
E
A
E
D
B
C
A
F
A
C
D
D
A
A
B
A
F
D
C
A
C
B
F
D
A
E
A
C
D
Seems to work fine, but I did not put the asterisks. Each cell (copied form the data example you gave) only has the character : no spaces...
Probably it does not work, because the C is written in Cyrillic. To make sure, whether this is the case, write C in English additionally and try to change the font to something Fancy - e.g. Algerian. Then the two C will be obviously different:
=COUNTIF(A2:A101,"*C*" )
For some reason its working now... I changed the font and messed around with it.
Thanks again for the help!!!!