remove special character from string in python - python-3.x

Like i have string variable which has value is given below
string_value = 'hello ' how ' are - you ? and/ nice to % meet # you'
Expected result:
hello how are you and nice to meet you

You could try just removing all non word characters:
string_value = "hello ' how ' are - you ? and/ nice to % meet # you"
output = re.sub(r'\s+', ' ', re.sub(r'[^\w\s]+', '', string_value))
print(string_value)
print(output)
This prints:
hello ' how ' are - you ? and/ nice to % meet # you
hello how are you and nice to meet you
The solution I used first targets all non word characters (except whitespace) using the pattern [^\w\s]+. But, there is then the chance that clusters of two or more spaces might be left behind. So, we make a second call to re.sub to remove extra whitespace.

Related

Move a character or word to a new line

Given a string how do i move part of the string in to a new line. without moving the rest of the line or characters
'This' and 'this' word should go in the next line
Output:
> and word should go in the next line
This this
This is just an example of the output i want assuming the words can be different by characters. To be more clear say i have some string elements in an array and i have to move every second and third word of the elements to a new line and printing the rest of the line as is. I've tried using \n and a for loop. But it also moves the rest of the string to a new line
['This and this', 'word should go', 'in the next']
Output:
> This word in
and this should go the next
So the 2nd and 3rd word of the elements are moved without affecting the rest of the line. Is it possible to do this without much complication? I'm aware of the format method but i don't know how to use it in this situation.
For your first example, in case you don't know the order of the target words in advance, I would use a dictionary to store the indices of the found words. Then you can sort those to put the found words in the second line in the same order as they appeared in the text:
targets = ['this', 'This']
source = 'This and this word should go in the next line.'
target_ixs = {source.find(target): target for target in targets}
line2 = ' '.join([target_ixs[i] for i in sorted(target_ixs)])
line1 = source
for target in targets:
line1 = line1.replace(target, '')
line1 = line1.replace(' ', ' ').lstrip()
result = line1 + '\n' + line2
print(result)
and word should go in the next line.
This this
Your second example is easier, because you already know which parts of the strings to put in the second line, so you just need to split each string into a list of words and select from those:
source = ['This and this', 'word should go', 'in the next']
source_lists = [s.split() for s in source]
line1 = ' '.join([source_list[0] for source_list in source_lists])
line2 = ' '.join([' '.join(source_list[1:]) for source_list in source_lists])
result = line1 + '\n' + line2
print(result)
This word in
and this should go the next
You can probably do quite a bit without much complication using the regular expression library and some python language features. That being said, it depends on how complex the rules are for determining what words go where. Typically, you want to start with a string and "tokenize" it into the constituent words. See the code example below:
import re
sentence = "This and this word should go in the next line"
all_words = re.split(r'\W+', sentence)
matched_words = " ".join(re.findall(r"this", sentence, re.IGNORECASE))
unmatched_words = " ".join([word for word in all_words if word not in matched_words])
print(f"{unmatched_words}\n{matched_words}")
> and word should go in the next line
This this
Final Thoughts:
I am by no means a regex ninja so, there may be even more clever things that can be done with just regex patterns and functions. Hopefully, this gives you some food for thought at least.
Got it:
data = ['This and this', 'word should go', 'in the next']
first_line = []
second_line = []
for item in data:
item = item.split(' ')
first_word = item[0]
item.remove(first_word)
others = " ".join(item)
first_line.append(first_word)
second_line.append(others)
print(" ".join(first_line) + "\n" + " ".join(second_line))
My Solution:
input_data = ['This and this', 'word should go ok', 'this next']
I've slightly altered your test string to better test the code.
# Example 1
# Print all words in input_data, moving any word matching the
# string "this" (match is case insensitive) to the next line.
print('Example 1')
lines = ([], [])
for words in input_data:
for word in words.split():
lines[word.lower() == 'this'].append(word)
result = ' '.join(lines[0]) + '\n' + ' '.join(lines[1])
print(result)
The code in example 1 sorts each word into the 2-element tuple, lines. The key part is the boolean expression that preforms the string comparison.
# Example 2
# Print all words in input_data, moving the second and third
# word in any string to the next line.
from itertools import count
print('\nExample 2')
lines = ([], [])
for words in input_data:
for q in zip(count(), words.split()):
lines[q[0] in (1, 2)].append(q[1])
result = ' '.join(lines[0]) + '\n' + ' '.join(lines[1])
print(result)
The next solution is basically the same as the first. I zip each word to an integer so you know the word's position when you get to the boolean expression which, again, sorts the words into their appropriate list in lines.
As you can see, this solution is fairly flexible and can be adjusted to fit a number of scenarios.
Good luck, and I hope this helped!

find better way to find the text in string contains multi same signs

I have below text which each info (text and length) between "|" is different by time , only the number of "|" is fixed. I can retrieve the info i want ("XYZGM")but do we have better way to do ?
"#BATCH|ABCDEF|01|12|1||XYZGM|210401113439|online|ATGHDGV03|QGH83826|RevA|||"
Current code i used:
text="{#BATCH|ABCDEF|01|12|1||XYZGM|210401113439|online|ATGHDGV03|QGH83826|RevA|||"
# get text from 6th position to 7th position of "|"
pos_count=0
z=0
for i in range(z,len(text)):
pos=text.find('|', z, len(text))
if pos>0:
pos_count+=1
z=pos+1
if pos_count==6:
x=pos+1
if pos_count==7:
y=pos
break
print("X: {}, Y: {}".format(x,y))
result=text[x:y]
print(result)
and the result is : "XYZGM"
Another option could be using a pattern:
^{#(?:[^|]*\|){6}([^|]+)
^ Start of string
{# Match {#
(?:[^|]*\|){6} Repeat 6 times any char except | then match |
([^|]+) Capture group 1, match 1+ times any char except |
Regex demo
import re
pattern = r"^{#(?:[^|]*\|){6}([^|]+)"
s = "{#BATCH|ABCDEF|01|12|1||XYZGM|210401113439|online|ATGHDGV03|QGH83826|RevA|||"
match = re.match(pattern, s)
if match:
print(match.group(1))
Output
XYZGM
No need using regex:
text="{#BATCH|ABCDEF|01|12|1||XYZGM|210401113439|online|ATGHDGV03|QGH83826|RevA|||"
if text.startswith("{#"):
print(text[2:].split("|")[6])
Make sure there is {# text at the beginning, split the rest with |, and get the sixth value.
Python code.

Print a string in python like how dot-matrix printer works

I wanted to print any English alphabet(s) using '*' or any other given special character like how a dot-matrix printer works.
I could come up with a function def printLetters(string, font_size, special_char): which when passed with any letter would print that letter using the special character specified.
Consider the letter 'A':
def printLetters('A', 10, '&'): # would print the letter A within a 10x10 matrix using '&'
&&&&&&&&
& &
& &
& &
&&&&&&&&&&
& &
& &
& &
& &
& &
and such code snippets for every character.
Example for 'A':
FUNCTION_TO_PRINT_A:
space = ' '
#print first line
print('', special_char*(font_size-2))
for i in range(1, font_size-1):
#print(i)
if font_size//2 == i:
print(special_char*(font_size))
print(special_char, space*(font_size-2), special_char, sep = '')
printLetters('A', 10, "&")
But when the parameter string has more than one characters, it prints gibberish after first character.
So I just wanted some ideas/code-snippets which would print the first row of all characters in string first and so on until the last row so that all those characters line up side by side horizontally on the console.
Ah, fond memories. We used to do this sort of thing all the time in the olden days before GUIs. It is good that you want to define the shape of each letter separately, but making a separate function to print each letter is obviously not useful. After you print the N lines of one letter, you have no way to get back to the top.
You know that you need to print the top line of ALL letters, then the 2nd line of ALL letters, etc. I don't want to write you a complete example, because it's pretty important for you to learn how to figure this kind of thing out, but why don't you start with something like this:
font = {
'A': [
' ### ',
'# # ',
'##### ',
'# # ',
'# # '
],
'B': [
'#### ',
'# # ',
'#### ',
'# # ',
'#### '
],
...etc...
}

need to use a 'for loop' for this one. The user has to enter a sentence and any spaces must be replaced with "%"

the input
sentence = input("Please enter a sentence:")
the for loop (incorrect here)
for i in sentence:
print(sentence)
space_loc = sentence.index(" ")
for c in sentence:
print(space_loc)
for b in range(space_loc):
print("%")
confused about how to get the answer out.
You can try using concatenation of strings and slicing in this one.
sentence = input()
After taking the input simply store the length of your string
length = len(sentence)
Then iterate through every characters in the string and when you find a " ", break the string into two halves using slicing such that each half has one side of the string from " ". And then, join it by a "%" :-
for i in range(length):
if sentence[i]==" ":
sentence = sentence[:i] + "%" + sentence[i+1:]
Here, sentence[:i] is the part of string before the space and sentence[i+1:] is the part of string after the space.
One way of solving your query:
Code
sentence = input("Please enter a sentence:")
ls=sentence.split() #Creating a list of words present in sentence
new_sentence='%'.join(ls) #Joining the list with '%'
print(new_sentence)
Output
Please enter a sentence:Hello there coders!
Hello%there%coders!
EDIT
I do not understand how exactly you want to use the for loop here. If you just want to include a for loop (no restrictions), then you can do this:
Code
ls=[]
a=0
sentence = input("Please enter a sentence:")
for i in range(0,len(sentence)): # This loop will find the words in the sentence and store them in a list. Words are determined by checking the white space. Each space is replaced with '%'
if sentence[i]==' ':
ls.append(sentence[a:i])
a=i
ls.append('%')
ls.append(sentence[a:]) # This is to save the last word
ls1=[]
for i in ls: # Removing any white space inside the list
j=i.replace(' ','')
ls1.append(j)
print(''.join(ls1)) # Displaying final output
Again, your question is very open ended and this is just one way of using for loop to get the desired result!

Get only one word from line

How can I take only one word from a line in file and save it in some string variable?
For example my file has line "this, line, is, super" and I want to save only first word ("this") in variable word. I tried to read it character by character until I got on "," but I when I check it I got an error "Argument of type 'int' is not iterable". How can I make this?
line = file.readline() # reading "this, line, is, super"
if "," in len(line): # checking, if it contains ','
for i in line:
if "," not in line[i]: # while character is not ',' -> this is where I get error
word += line[i] # add it to my string
You can do it like this, using split():
line = file.readline()
if "," in line:
split_line = line.split(",")
first_word = split_line[0]
print(first_word)
split() will create a list where each element is, in your case, a word. Commas will not be included.
At a glance, you are on the right track but there are a few things wrong that you can decipher if you always consider what data type is being stored where. For instance, your conditional 'if "," in len(line)' doesn't make sense, because it translates to 'if "," in 21'. Secondly, you iterate over each character in line, but your value for i is not what you think. You want the index of the character at that point in your for loop, to check if "," is there, but line[i] is not something like line[0], as you would imagine, it is actually line['t']. It is easy to assume that i is always an integer or index in your string, but what you want is a range of integer values, equal to the length of the line, to iterate through, and to find the associated character at each index. I have reformatted your code to work the way you intended, returning word = "this", with these clarifications in mind. I hope you find this instructional (there are shorter ways and built-in methods to do this, but understanding indices is crucial in programming). Assuming line is the string "this, line, is, super":
if "," in line: # checking that the string, not the number 21, has a comma
for i in range(0, len(line)): # for each character in the range 0 -> 21
if line[i] != ",": # e.g. if line[0] does not equal comma
word += line[i] # add character to your string
else:
break # break out of loop when encounter first comma, thus storing only first word

Resources