So I have to create a program that reads a paragraph from Romeo & Juliet and returns the number of characters, number of spaces, number of words, and the top 3 most common characters without the use of built in functions such as counter. I made this program to use parallel lists to count unique characters and add a tally into the second list. My problem lies in being able to pick out the top three and print the two lists (i.e. "A 3").
This is the txt file the program reads:
But soft what light through yonder window breaks
It is the east and Juliet is the sun
Arise fair sun and kill the envious moon
Who is already sick and pale with grief
This is the program I have so far:
charCount = []
uniqueChar = []
char_count = 0
word_Count = 0
space_count = 0
Open_File = open("romeo.txt")
for romeo in Open_File:
for char in romeo:
if (char == ' ' or char == '.'):
word_Count += 1
if (char == ' '):
space_count += 1
if (char != ' ' and char != '\n'):
char_count += 1
if (char not in uniqueChar):
uniqueChar.append(char)
charCount.append(1)
else:
for j in range(len(uniqueChar)):
if (uniqueChar[j] == char):
charCount[j] += 1
print("Spaces: ", space_count)
print("Char: ", char_count)
print("Words: ", word_Count)
Related
I have seen many cases where people sometimes rely on whitespaces which causes some miscalculations.
For Example, take 2 strings;
const str1: string = 'I love stackoverflow'
const str2: string = 'I love stackoverflow'
Using the numOfWhitespaces + 1 thing gives wrong number of words in case of str2. The reason is obvious that it counts 6 number of spaces.
So what should be an easy and better alternative?
The shortest would be using: str1.split(/\s+/).length
But just in case any beginner want to do it with basic loop, here it is:
let str1: string = 'I love stackoverflow'
let numberOfSpaces: number = 0
for (let index = 1; index <= str1.length; index++) {
let lastChar: string = ''
let currentChar: string = ''
currentChar = str1.charAt(index)
lastChar = str1.charAt(index - 1)
if (currentChar === " " && lastChar !== " ") {
numberOfSpaces = numberOfSpaces+ 1
}
else if (currentChar === " " && lastChar === " ") { // This is a test String.
numberOfSpaces = numberOfSpaces + 0
}
//I have not added an else statement for the case if both current char and last char are not whitespaces.
//because I felt there was no need for that and it works perfectly.
}
const finalNumberOfWords: number = numberOfSpaces + 1
console.log(`Number of words final are = ${finalNumberOfWords}`)
So this might look similar to the counting whitespaces method, yes it is but this one doesn't count the extraneous spaces [space followed by a space].
A for loop runs throughout the length of the string. It compares the character at current position of str1[index]and its previous index. If both are whitespaces, it won't count but if previous character was non-null and current is blank, it increments the counter by one.
And finally we add 1 to the counter to display number of words.
Here's a screenshot:
An alternative solution would be to use a regex:
const str2: string = 'I love stackoverflow'
console.log(str2.split(/\s+/).length);
This will ensure that multiple spaces will be splitted.
Test:
console.log('I love stackoverflow'.split(/\s+/).length);
console.log('Ilovestackoverflow'.split(/\s+/).length);
Given a string s containing only lower case alphabets (a - z), find (i.e print) the characters that are repeated.
For ex, if string s = "aabcacdddec"
Output: a c d
3 approaches to this problem exists:
[brute force] Check every char of string (i.e s[i] with every other char and print if both are same)
Time complexity: O(n^2)
Space complexity: O(1)
[sort and then compare adjacent elements] After sorting (in O(n log(n) time), traverse the string and check if s[i] ans s[i + 1] are equal
Time complexity: O(n logn) + O(n) = O(n logn)
Space complexity: O(1)
[store the character count in an array] Create an array of size 26 (to keep track of a - z) and for every s[i], increment value stored at index = s[i] - 26 in the array. Finally traverse the array and print all elements (i.e 'a' + i) with value greater than 1
Time complexity: O(n)
Space complexity: O(1) but we have a separate array for storing the frequency of each element.
Is there a O(n) approach that DOES NOT use any array/hash table/map (etc)?
HINT: Use BIT Vectors
This is the element distinctness problem, so generally speaking - no there is no way to solve it in O(n) without extra space.
However, if you regard the alphabet as constant size (a-z characters only is pretty constant) you can either create a bitset of these characters, in O(1) space [ it is constant!] or check for each character in O(n) if it repeats more than once, it will be O(constant*n), which is still in O(n).
Pseudo code for 1st solution:
bit seen[] = new bit[SIZE_OF_ALPHABET] //contant!
bit printed[] = new bit[SIZE_OF_ALPHABET] //so is this!
for each i in seen.length: //init:
seen[i] = 0
printed[i] = 0
for each character c in string: //traverse the string:
i = intValue(c)
//already seen it and didn't print it? print it now!
if seen[i] == 1 and printed[i] == 0:
print c
printed[i] = 1
else:
seen[i] = 1
Pseudo code for 2nd solution:
for each character c from a-z: //constant number of repeats is O(1)
count = 0
for each character x in the string: //O(n)
if x==c:
count += 1
if count > 1
print count
Implementation in Java
public static void findDuplicate(String str) {
int checker = 0;
char c = 'a';
for (int i = 0; i < str.length(); ++i) {
int val = str.charAt(i) - c;
if ((checker & (1 << val)) > 0) {
System.out.println((char)(c+val));
}else{
checker |= (1 << val);
}
}
}
Uses as int as storage and performs bit wise operator to find the duplicates.
it is in O(n) .. explanation follows
Input as "abddc"
i==0
STEP #1 : val = 98 - 98 (0) str.charAt(0) is a and conversion char to int is 98 ( ascii of 'a')
STEP #2 : 1 << val equal to ( 1 << 0 ) equal to 1 finally 1 & 0 is 0
STEP #3 : checker = 0 | ( 1 << 0) equal to 0 | 1 equal to 1 checker is 1
i==1
STEP #1 : val = 99 - 98 (1) str.charAt(1) is b and conversion char to int is 99 ( ascii of 'b')
STEP #2 : 1 << val equal to ( 1 << 1 ) equal to 2 finally 1 & 2 is 0
STEP #3 : checker = 2 | ( 1 << 1) equal to 2 | 1 equal to 2 finally checker is 2
i==2
STEP #1 : val = 101 - 98 (3) str.charAt(2) is d and conversion char to int is 101 ( ascii of 'd')
STEP #2 : 1 << val equal to ( 1 << 3 ) equal to 8 finally 2 & 8 is 0
STEP #3 : checker = 2 | ( 1 << 3) equal to 2 | 8 equal to 8 checker is 8
i==3
STEP #1 : val = 101 - 98 (3) str.charAt(3) is d and conversion char to int is 101 ( ascii of 'd')
STEP #2 : 1 << val equal to ( 1 << 3 ) equal to 8 finally 8 & 8 is 8
Now print 'd' since the value > 0
You can also use the Bit Vector, depends upon the language it would space efficient. In java i would prefer to use int for this fixed ( just 26) constant case
The size of the character set is a constant, so you could scan the input 26 times. All you need is a counter to store the number of times you've seen the character corresponding to the current iteration. At the end of each iteration, print that character if your counter is greater than 1.
It's O(n) in runtime and O(1) in auxiliary space.
Implementation in C# (recursive solution)
static void getNonUniqueElements(string s, string nonUnique)
{
if (s.Count() > 0)
{
char ch = s[0];
s = s.Substring(1);
if (s.LastIndexOf(ch) > 0)
{
if (nonUnique.LastIndexOf(ch) < 0)
nonUnique += ch;
}
getNonUniqueElements(s, nonUnique);
}
else
{
Console.WriteLine(nonUnique);
return;
}
}
static void Main(string[] args)
{
getNonUniqueElements("aabcacdddec", "");
Console.ReadKey();
}
PYTHON QN:
Using just one loop, how do I devise an algorithm that counts the number of substrings that begin with character A and ends with character X? For example, given the input string CAXAAYXZA there are four substrings that begin with A and ends with X, namely: AX, AXAAYX, AAYX, and AYX.
For example:
>>>count_substring('CAXAAYXZA')
4
Since you didn't specify a language, im doing c++ish
int count_substring(string s)
{
int inc = 0;
int substring_count = 0;
for(int i = 0;i < s.length();i++)
{
if(s[i] == 'A') inc++;
if(s[i] == 'X') substring_count += inc;
}
return substring_count;
}
and in Python
def count_substring(s):
inc = 0
substring_count = 0
for c in s:
if(c == 'A'): inc = inc + 1
if(c == 'X'): substring_count = substring_count + inc
return substring_count
First count number of "A" in the string
Then count "X" in the string
using
Public Function CountCharacter(ByVal value As String, ByVal ch As Char) As Integer
Dim cnt As Integer = 0
For Each c As Char In value
If c = ch Then cnt += 1
Next
Return cnt
End Function
then take each "A" as a start position and "X" as an end position and get the substring. Do this for each "X" and then start with second "A" and run that for "X" count times. Repeat this and you will get all the substrings starting with "A" and ending with "X".
Just another solution In python:
def count_substring(str):
length = len(str) + 1
found = []
for i in xrange(0, length):
for j in xrange(i+1, length):
if str[i] == 'A' and str[j-1] == 'X':
found.append(str[i:j])
return found
string = 'CAXAAYXZA'
print count_substring(string)
Output:
['AX', 'AXAAYX', 'AAYX', 'AYX']
I am currently learning Python and I am trying to get this game to work. Basically I assigned a word to be guessed and then sliced the word and assigned it to several other variables. Basically, each variable assigned as "letterx" is a letter which makes up part of the string variable word. The problem is getting the while statement with nested if statements to work. For some reason I can't get the guess input to equal letterx. All I get when I run the code is "No." and then the amount of turns left. However, I can't get the elif statement to work. Pretty much everything else works. I'm used to programming in Java and I am fairly new to Python so any tips or help would be greatly appreciated. Thank you for your time and help! Here's the code:
#Guess The Word
word = "action"
letter1 = ""
letter2 = ""
letter3 = ""
letter4 = ""
letter5 = ""
letter6 = ""
position1 = 0
position2 = 1
position3 = 2
position4 = 3
position5 = 4
position6 = 5
letter1 += word[position1]
letter2 += word[position2]
letter3 += word[position3]
letter4 += word[position4]
letter5 += word[position5]
letter6 += word[position6]
print("Welcome to Guess the Word!\n")
count = 6
while(count != 0):
guess = input("Take a guess: \n")
if(guess != letter1 or guess != letter2 or guess != letter3 or guess !=
letter4 or guess != letter5 or guess != letter6):
count -= 1
print("No.\n")
print("Turns left: \n", count)
elif(guess == letter1 or guess == letter2 or guess == letter3
or guess == letter4 or guess == letter5 or guess == letter6):
count -= 1
print("Yes.\n")
if(count == 0):
print("Your turns are up, what do you think the word is?")
guess = input("The word is...: \n")
if(guess == word):
print("You win! That's the word")
elif(guess != word):
print("Sorry, you lose.")
Here's the program running in the Python shell:
Python 3.1.1 (r311:74483, Aug 17 2009, 17:02:12) [MSC v.1500 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> ================================ RESTART ================================
>>>
Welcome to Guess the Word!
Take a guess:
a
No.
Turns left:
5
Take a guess:
c
No.
Turns left:
4
Take a guess:
t
No.
Turns left:
3
Take a guess:
i
No.
Turns left:
2
Take a guess:
o
No.
Turns left:
1
Take a guess:
n
No.
Turns left:
0
Your turns are up, what do you think the word is?
The word is...:
action
You win! That's the word
Let's say guess equals letter1. Then even though
guess == letter1, the first condition is still True since guess != letter2. And similarly, no matter what guess is, there is some letter (amongst letter1, letter2, etc.) which it is not equal.
So the first if condition is always True.
Instead, you could use
while(count != 0):
guess = input("Take a guess: \n")
if not guess in word:
count -= 1
print("No.\nTurns left: \n", count)
else:
count -= 1
print("Yes.\n")
By the way, it should be entirely possible to code the game without defining letter1, letter2, etc. All this code should be deleted:
letter1 = ""
letter2 = ""
letter3 = ""
letter4 = ""
letter5 = ""
letter6 = ""
position1 = 0
position2 = 1
position3 = 2
position4 = 3
position5 = 4
position6 = 5
letter1 += word[position1]
letter2 += word[position2]
letter3 += word[position3]
letter4 += word[position4]
letter5 += word[position5]
letter6 += word[position6]
Just use word[0] in place of letter1, and word[1] in place of letter2, etc.
And note you may not even need word[0], word[1]. For example,
with Python you can use
guess in word
instead of
guess in (word[0], word[1], word[2], word[3], word[4], word[5])
It's not only a lot less typing, it is more general, since guess in word does the right thing with words of any length.
I do a lot of urgent analysis of large logfile analysis. Often this will require tailing a log and looking for changes.
I'm keen to have a solution that will highlight these changes to make it easier for the eye to track.
I have investigated tools and there doesn't appear to be anything out there that does what I am looking for. I've written some scripts in Perl that do it roughly, but I would like a more complete solution.
Can anyone recommend a tool for this?
Levenshtein distance
Wikipedia:
Levenshtein distance between two strings is minimum number of operations needed to transform one string into the other, where an operation is an insertion, deletion, or substitution of a single character.
public static int LevenshteinDistance(char[] s1, char[] s2) {
int s1p = s1.length, s2p = s2.length;
int[][] num = new int[s1p + 1][s2p + 1];
// fill arrays
for (int i = 0; i <= s1p; i++)
num[i][0] = i;
for (int i = 0; i <= s2p; i++)
num[0][i] = i;
for (int i = 1; i <= s1p; i++)
for (int j = 1; j <= s2p; j++)
num[i][j] = Math.min(Math.min(num[i - 1][j] + 1,
num[i][j - 1] + 1), num[i - 1][j - 1]
+ (s1[i - 1] == s2[j - 1] ? 0 : 1));
return num[s1p][s2p];
}
Sample App in Java
String Diff
Application uses LCS algorithm to concatenate 2 text inputs into 1. Result will contain minimal set of instructions to make one string for the other. Below the instruction concatenated text is displayed.
Download application:
String Diff.jar
Download source:
Diff.java
I wrote a Python script for this purpose that utilizes difflib.SequenceMatcher:
#!/usr/bin/python3
from difflib import SequenceMatcher
from itertools import tee
from sys import stdin
def pairwise(iterable):
"""s -> (s0,s1), (s1,s2), (s2, s3), ...
https://docs.python.org/3/library/itertools.html#itertools-recipes
"""
a, b = tee(iterable)
next(b, None)
return zip(a, b)
def color(c, s):
"""Wrap string s in color c.
Based on http://stackoverflow.com/a/287944/1916449
"""
try:
lookup = {'r':'\033[91m', 'g':'\033[92m', 'b':'\033[1m'}
return lookup[c] + str(s) + '\033[0m'
except KeyError:
return s
def diff(a, b):
"""Returns a list of paired and colored differences between a and b."""
for tag, i, j, k, l in SequenceMatcher(None, a, b).get_opcodes():
if tag == 'equal': yield 2 * [color('w', a[i:j])]
if tag in ('delete', 'replace'): yield color('r', a[i:j]), ''
if tag in ('insert', 'replace'): yield '', color('g', b[k:l])
if __name__ == '__main__':
for a, b in pairwise(stdin):
print(*map(''.join, zip(*diff(a, b))), sep='')
Example input.txt:
108 finished /tmp/ts-out.5KS8bq 0 435.63/429.00/6.29 ./eval.exe -z 30
107 finished /tmp/ts-out.z0tKmX 0 456.10/448.36/7.26 ./eval.exe -z 30
110 finished /tmp/ts-out.wrYCrk 0 0.00/0.00/0.00 tail -n 1
111 finished /tmp/ts-out.HALY18 0 460.65/456.02/4.47 ./eval.exe -z 30
112 finished /tmp/ts-out.6hdkH5 0 292.26/272.98/19.12 ./eval.exe -z 1000
113 finished /tmp/ts-out.eFBgoG 0 837.49/825.82/11.34 ./eval.exe -z 10
Output of cat input.txt | ./linediff.py:
http://neil.fraser.name/software/diff_match_patch/svn/trunk/demos/demo_diff.html
.. this look promising, will update this with more info when Ive played more..