Find word position (not character position) in a string - string

I am sorting the words in a given paragraph from a text file. I have the line numbers and the number of occurrence of each word sorted out. However, I am having a bit of a trouble trying to get the word position number per line. I am using object list to sort the strings.
For example:
Given line
I have a question about the word position number
Output should be
question - line 1 position 4
about - line 1 position 5
position - line 1 position 8
Any useful suggestions how I should go about this? Below is my code.
public void visit() {
ListNode p = lines.getFirstNode();
if (word.length() < 4) {
so.output(word + "\t\t\t\t" + count + "\t\t\t\t");
while (p != null) {
so.output(p.getInfo() + " ");
p = p.getNext();
}
so.output("\n");
...

you're looking for
String line = "I have a question about the word position number";
String[] lineArray = line.split(" ");
int wordPos = Arrays.asList(lineArray).indexOf("question");
just remember that wordPos here is going to output to 3, because arrays start counting from 0, so you'll probably want to add 1 to it when you're formatting your output string because users won't understand that.

Related

(dart) replaceAll method in a string in a loop (cypher)

I'm attempting to do CS50 courses in dart, so for week 2 substitution test i'm stuck with this:
void main(List<String> args) {
String alphabet = 'abcdefghijklmnopqrstuvwxyz';
String cypher = 'qwertyuiopasdfghjklzxcvbnm';
int n = alphabet.length;
print('entertext:');
String text = stdin.readLineSync(encoding: utf8)!;
for (int i = 0; i < n; i++) {
text = text.replaceAll(alphabet[i], cypher[i]);
}
print(text);
}
Expected result: abcdef = qwerty
Actual result: jvmkmn
Any ideas why this is happening? I'm a total beginner by the way
It is because you at first substitute the letter a with the letter q, but when n = 16, you will replace all the letter q with the letter j. This is why your a is turned into a j, and so forth...
Best of luck to you :)
For the record, the (very direct and) safer approach would be:
void main(List<String> args) {
String alphabet = 'abcdefghijklmnopqrstuvwxyz';
String cypher = 'qwertyuiopasdfghjklzxcvbnm';
assert(alphabet.length == cypher.length);
// Pattern matching any character in `alphabet`.
var re = RegExp('[${RegExp.escape(alphabet)}]');
print('enter text:');
String text = stdin.readLineSync(encoding: utf8)!;
// Replace each character matched by `re` with the corresponding
// character in `cypher`.
text = text.replaceAllMapped(re, (m) => cypher[alphabet.indexOf(m[0]!)]);
print(text);
}
(This is not an efficient approach. It does a linear lookup in the alphabet for each character. A more efficient approach would either recognize that the alphabet is a contiguous range of character codes, and just do some arithmetic to find the position in the alphabet, or (if it wasn't a contiguous range) could build a more efficient lookup table for the alphabet first).

How to find the longest existing substring of A's in a DNA string line and return the length?

I came along the following problem and do not know how to solve it. The problem is to find the length of the longest substring of repeating A's and return the value of the length for every string in this list:
['>KF735813.1 HIV-1 isolate Cameroon1(ViroSeq) HIV DR 02 from Cameroon
pol protein (pol) gene, partial cds',
'CCTCAAATCACTCTTTGGCAACGACCCTTAGTCACAGTTAGGATAGAGGGACAGTTAATAGAAGCCCTATTAGACACAGG',
'GGCAGATGATACAGTATTAGAAGAGATAAATTTACCAGGAAAATGGAAACCAAAAATGATAGGGGGAATTGGAGGTTTTA',
'TCAAAGTAAGACAGTATGATCAGATACTTATAGAAATTTGTGGAAAAAGGGCCATAGGTACAGTATTAGTAGGACCTACA',
'CCTGTCAACATAATTGGACGAAACATGTTGACTCAGATTGGTTGTACTTTAAATTTTCCAATTAGTCCTATTGAAACTGT',
'GCCAGTAAAATTAAAGCCAGGTATGGATGGCCCAAAGGTAAAACAATGGCCATTGACAGAAGAAAAAATAAAAGCATTAA',
'CAGAAATTTGTACAGATATGGAAAAGGAGGGAAAAATTTCAATAATTGGGCCTGAAAATCCATATAATACTCCAGTATTT',
'GCCATAAAGAAAAAAGATAGTACTAAATGGAGAAAATTAGTAGATTTTAGAGAACTTAATAAGAGAACTCAAGACTTCTG',
'GGAGATCCAATTAGGAATACCTCATCCCGCGGGATTAAAAAAGAACAAATCAGTAACAGTACTAGATGTGGGGGATGCAT',
'ATTTTTCAGTTCCCTTAGATTAAGACTTTAGAAAGTACACTGCATTCACTATACCTAGTTTAAATAATGCAACACCAGGT',
'ATTAGATATCAGTACAATGTGCTTCCACAGGGATGGAAAGGATCACCAGCAATATTTCAGGCAAGCATGACAAAAATCTT',
'AGAGCCCTTTAGGACAAAATATCCAGAAATAGTGATCTACCAATATATGGATGATTTATATGTAGGATCAGACTTAGAGA',
'TAGGGCAGCATAGAGCAAAAATAGAGGAGTTGAGAGTACATCTATTGAAGTGGGGATTTACCACACCAGACAAAAAACAT',
'CAGAAAGAACCTCCATTTCTTTGGATGGGATATGAACTCCATCCTGACAAATGGACAGTCCAGCCTATACAGCTGCCAGA',
'AAAAGACAGCTGGACTGTCAATGATATACAGAAATTAGTGGGAAAACTAAATTGGGCAAGTCAGATTTATGCAGGAATTA',
'AAGTAAAGCAACTGTGTAGACTCCTCAGGGGAGCCAAAGCACTAACAGAGGTAGTACCACTAACTGAGGAAGCAGAATTA',
'GAATTGGCAGATAACAGGGAGATTCTAAAAGAACCTGTACATGGAGTATATTATGACCCAACAAAAGACTTAGTAGCAGA',
'AATACAGAAGCAAGGGCAAGAC']
Here is the function I have tried to do, but I know this is the wrong approach:
for c in range(len(fastarec_Lines)):
if fastarec_Lines[c].count('A') == current:
count += 1
else:
count = 1
current = fastarec_Lines[c]
maximum = max(count,maximum)
return maximum
Can someone help me out ?
One approach would be to do a regex find all search on the pattern A+. Then, sort the resulting string based on length, and print out the last element:
seq = "AATTGGCCAAAAATTGCA"
matches = re.findall(r'A+', seq)
matches.sort(lambda x,y: cmp(len(x), len(y)))
print("longest string is " + matches[-1] + " with a length of " + str(len(matches[-1])))
This prints:
longest string is AAAAA with a length of 5

Skipping spaces in Groovy

I'm trying to write a conditional statement where I can skip a specific space then start reading all the characters after it.
I was thinking to use substring but that wouldn't help because substring will only work if I know the exact number of characters I want to skip but in this case, I want to skip a specific space to read characters afterward.
For example:
String text = "ABC DEF W YZ" //number of characters before the spaces are unknown
String test = "A"
if ( test == "A") {
return text (/*escape the first two space and return anything after that*/)
}
You can split your string on " " with tokenize, remove the first N elements from the returned array (where N is the number of spaces you want to ignore) and join what's left with " ".
Supposing your N is 2:
String text = "ABC DEF W YZ" //number of characters before the spaces are unknown
String test = "A"
if ( test == "A") {
return text.tokenize(" ").drop(2).join(" ")
}

Check If String Remains Same After Rotating Left Or Right N Places

Suppose we have a string "bcb".
If we rotate it 1 place left, it becomes "cbb",
which is not the same as the original.
If we rotate it 2 places left, it becomes "bbc",
which is not the same as the original.
If we rotate it 3 places left, it becomes "bcb",
which IS the same as the original.
What is best way to find if a string remains same after rotating it n places left or right?
To check if the rotated string is still the same as original you will have to loop through the string characters and check if rotated number + index you are looping of string is you same for all the characters for string length. If it is not then return false else true
Here is the small Python code snippet for reference:
sample_str = 'bcb'
num_rotation = 6
def checkSimilarity(str, num_rotation):
length = len(str)
for i in range(length):
if(sample_str[i] != sample_str[(num_rotation + i)%length]):
return False
return True
print(checkSimilarity(sample_str, num_rotation))
Hope this helps.
Concatenate the string to itself (length 2n). Step through the resulting string in n-length slices, looking for a match.
double = orig + orig
orig_len = length(orig)
for i in [1:orig_len]
if double[i:i+orig_len] == orig
print "success at index", i
If your implementation language has a built-in substring search, use that instead of this loop.
Have two indexes into the string. Start one at the first character, and start the second at the nth character. Now, compare character-by-character.
You need to handle the wraparound case for the second index, but that's no trouble.
Pseudocode:
mystring = "bcbc"
ix1 = 0
ix2 = n mod mystring.length
while (ix1 < mystring.length)
if mystring[ix1] != mystring[ix2]
return false
++ix1
ix2 = (ix2 + 1) mod mystring.length
end while
return true
If you rotate one space to the left, this will return false. Two spaces, it will return true.

Change Letters in A String One at a Time (Pandas,Python3)

I have a list of words in Pandas (DF)
Words
Shirt
Blouse
Sweater
What I'm trying to do is swap out certain letters in those words with letters from my dictionary one letter at a time.
so for example:
mydict = {"e":"q,w",
"a":"z"}
would create a new list that first replaces all the "e" in a list one at a time, and then iterates through again replacing all the "a" one at a time:
Words
Shirt
Blouse
Sweater
Blousq
Blousw
Swqater
Swwater
Sweatqr
Sweatwr
Swezter
I've been looking around at solutions here: Mass string replace in python?
and have tried the following code but it changes all instances "e" instead of doing so one at a time -- any help?:
mydict = {"e":"q,w"}
s = DF
for k, v in mydict.items():
for j in v:
s['Words'] = s["Words"].str.replace(k, j)
DF["Words"] = s
this doesn't seem to work either:
s = DF.replace({"Words": {"e": "q","w"}})
This answer is very similar to Brian's answer, but a little bit sanitized and the output has no duplicates:
words = ["Words", "Shirt", "Blouse", "Sweater"]
md = {"e": "q,w", "a": "z"}
md = {k: v.split(',') for k, v in md.items()}
newwords = []
for word in words:
newwords.append(word)
for c in md:
occ = word.count(c)
pos = 0
for _ in range(occ):
pos = word.find(c, pos)
for r in md[c]:
tmp = word[:pos] + r + word[pos+1:]
newwords.append(tmp)
pos += 1
Content of newwords:
['Words', 'Shirt', 'Blouse', 'Blousq', 'Blousw', 'Sweater', 'Swqater', 'Swwater', 'Sweatqr', 'Sweatwr', 'Swezter']
Prettyprint:
Words
Shirt
Blouse
Blousq
Blousw
Sweater
Swqater
Swwater
Sweatqr
Sweatwr
Swezter
Any errors are a result of the current time. ;)
Update (explanation)
tl;dr
The main idea is to find the occurences of the character in the word one after another. For each occurence we are then replacing it with the replacing-char (again one after another). The replaced word get's added to the output-list.
I will try to explain everything step by step:
words = ["Words", "Shirt", "Blouse", "Sweater"]
md = {"e": "q,w", "a": "z"}
Well. Your basic input. :)
md = {k: v.split(',') for k, v in md.items()}
A simpler way to deal with replacing-dictionary. md now looks like {"e": ["q", "w"], "a": ["z"]}. Now we don't have to handle "q,w" and "z" differently but the step for replacing is just the same and ignores the fact, that "a" only got one replace-char.
newwords = []
The new list to store the output in.
for word in words:
newwords.append(word)
We have to do those actions for each word (I assume, the reason is clear). We also append the world directly to our just created output-list (newwords).
for c in md:
c as short for character. So for each character we want to replace (all keys of md), we do the following stuff.
occ = word.count(c)
occ for occurrences (yeah. count would fit as well :P). word.count(c) returns the number of occurences of the character/string c in word. So "Sweater".count("o") => 0 and "Sweater".count("e") => 2.
We use this here to know, how often we have to take a look at word to get all those occurences of c.
pos = 0
Our startposition to look for c in word. Comes into use in the next loop.
for _ in range(occ):
For each occurence. As a continual number has no value for us here, we "discard" it by naming it _. At this point where c is in word. Yet.
pos = word.find(c, pos)
Oh. Look. We found c. :) word.find(c, pos) returns the index of the first occurence of c in word, starting at pos. At the beginning, this means from the start of the string => the first occurence of c. But with this call we already update pos. This plus the last line (pos += 1) moves our search-window for the next round to start just behind the previous occurence of c.
for r in md[c]:
Now you see, why we updated mc previously: we can easily iterate over it now (a md[c].split(',') on the old md would do the job as well). So we are doing the replacement now for each of the replacement-characters.
tmp = word[:pos] + r + word[pos+1:]
The actual replacement. We store it in tmp (for debug-reasons). word[:pos] gives us word up to the (current) occurence of c (exclusive c). r is the replacement. word[pos+1:] adds the remaining word (again without c).
newwords.append(tmp)
Our so created new word tmp now goes into our output-list (newwords).
pos += 1
The already mentioned adjustment of pos to "jump over c".
Additional question from OP: Is there an easy way to dictate how many letters in the string I want to replace [(meaning e.g. multiple at a time)]?
Surely. But I have currently only a vague idea on how to achieve this. I am going to look at it, when I got my sleep. ;)
words = ["Words", "Shirt", "Blouse", "Sweater", "multipleeee"]
md = {"e": "q,w", "a": "z"}
md = {k: v.split(',') for k, v in md.items()}
num = 2 # this is the number of replaces at a time.
newwords = []
for word in words:
newwords.append(word)
for char in md:
for r in md[char]:
pos = multiples = 0
current_word = word
while current_word.find(char, pos) != -1:
pos = current_word.find(char, pos)
current_word = current_word[:pos] + r + current_word[pos+1:]
pos += 1
multiples += 1
if multiples == num:
newwords.append(current_word)
multiples = 0
current_word = word
Content of newwords:
['Words', 'Shirt', 'Blouse', 'Sweater', 'Swqatqr', 'Swwatwr', 'multipleeee', 'multiplqqee', 'multipleeqq', 'multiplwwee', 'multipleeww']
Prettyprint:
Words
Shirt
Blouse
Sweater
Swqatqr
Swwatwr
multipleeee
multiplqqee
multipleeqq
multiplwwee
multipleeww
I added multipleeee to demonstrate, how the replacement works: For num = 2 it means the first two occurences are replaced, after them, the next two. So there is no intersection of the replaced parts. If you would want to have something like ['multiplqqee', 'multipleqqe', 'multipleeqq'], you would have to store the position of the "first" occurence of char. You can then restore pos to that position in the if multiples == num:-block.
If you got further questions, feel free to ask. :)
Because you need to replace letters one at a time, this doesn't sound like a good problem to solve with pandas, since pandas is about doing everything at once (vectorized operations). I would dump out your DataFrame into a plain old list and use list operations:
words = DF.to_dict()["Words"].values()
for find, replace in reversed(sorted(mydict.items())):
for word in words:
occurences = word.count(find)
if not occurences:
print word
continue
start_index = 0
for i in range(occurences):
for replace_char in replace.split(","):
modified_word = list(word)
index = modified_word.index(find, start_index)
modified_word[index] = replace_char
modified_word = "".join(modified_word)
print modified_word
start_index = index + 1
Which gives:
Words
Shirt
Blousq
Blousw
Swqater
Swwater
Sweatqr
Sweatwr
Words
Shirt
Blouse
Swezter
Instead of printing the words, you can append them to a list and re-create a DataFrame if that's what you want to end up with.
If you are looping, you need to update s at each cycle of the loop. You also need to loop over v.
mydict = {"e":"q,w"}
s=deduped
for k, v in mydict.items():
for j in v:
s = s.replace(k, j)
Then reassign it to your dataframe:
df["Words"] = s
If you can write this as a function that takes in a 1d array (list, numpy array etc...), you can use df.apply to apply it to any column, using df.apply().

Resources