We have a legacy system that is exporting reports as .txt files, but in almost all instances when a date is supplied, it is after a currency denomination, and looks like this example:
25.0002/14/18 (25 bucks on feb 14th) or 287.4312/08/17.
Is there an easy way to parse for . and add a space character two spaces to the right to separate the string in Python? Any help is greatly appreciated!
The code below would add a space between the currency and the data given a string.
import re
my_file_text = "This is some text 287.4312/08/17"
new_text = re.sub("(\d+\.\d{2})(\d{2}/\d{2}/\d{2})", r"\1 \2", my_file_text)
print(new_text)
OUTPUT
'This is some text 287.43 12/08/17'
REGEX
(\d+\.\d{2}): This part of the regex captures the currency in it's own group, it assumes that it will have any number of digits (>1) before the . and then only two digits after, so something like (1000.25) would be captured correctly, while (1000.205) and (.25) won't.
(\d{2}/\d{2}/\d{2}): This part captures the date, it assumes that the day, month and year portion of the dates will always be represented using two digits each and separated by a /.
Perhaps more efficient methods, but an easy way could be:
def fix(string):
if '.' in string:
part_1, part_2 = string.split('.')
part_2_fixed = part_2[:2] + ' ' + part_2[2:]
string = part_1 + '.' + part_2_fixed
return string
In [1]: string = '25.0002/14/18'
In [2]: fix(string)
Out[2]: '25.00 02/14/18'
Related
I have a string, I have to get digits only from that string.
url = "www.mylocalurl.com/edit/1987"
Now from that string, I need to get 1987 only.
I have been trying this approach,
id = [int(i) for i in url.split() if i.isdigit()]
But I am getting [] list only.
You can use regex and get the digit alone in the list.
import re
url = "www.mylocalurl.com/edit/1987"
digit = re.findall(r'\d+', url)
output:
['1987']
Replace all non-digits with blank (effectively "deleting" them):
import re
num = re.sub('\D', '', url)
See live demo.
You aren't getting anything because by default the .split() method splits a sentence up where there are spaces. Since you are trying to split a hyperlink that has no spaces, it is not splitting anything up. What you can do is called a capture using regex. For example:
import re
url = "www.mylocalurl.com/edit/1987"
regex = r'(\d+)'
numbers = re.search(regex, url)
captured = numbers.groups()[0]
If you do not what what regular expressions are, the code is basically saying. Using the regex string defined as r'(\d+)' which basically means capture any digits, search through the url. Then in the captured we have the first captured group which is 1987.
If you don't want to use this, then you can use your .split() method but this time provide a split using / as the separator. For example `url.split('/').
I am making a small project in python that lets you make notes then read them by using specific arguments. I attempted to make an if statement to check if the string has a comma in it, and if it does, than my python file should find the comma then find the character right below that comma and turn it into an integer so it can read out the notes the user created in a specific user-defined range.
If that didn't make sense then basically all I am saying is that I want to find out what line/bit of code is causing this to not work and return nothing even though notes.txt has content.
Here is what I have in my python file:
if "," not in no_cs: # no_cs is the string I am searching through
user_out = int(no_cs[6:len(no_cs) - 1])
notes = open("notes.txt", "r") # notes.txt is the file that stores all the notes the user makes
notes_lines = notes.read().split("\n") # this is suppose to split all the notes into a list
try:
print(notes_lines[user_out])
except IndexError:
print("That line does not exist.")
notes.close()
elif "," in no_cs:
user_out_1 = int(no_cs.find(',') - 1)
user_out_2 = int(no_cs.find(',') + 1)
notes = open("notes.txt", "r")
notes_lines = notes.read().split("\n")
print(notes_lines[user_out_1:user_out_2]) # this is SUPPOSE to list all notes in a specific range but doesn't
notes.close()
Now here is the notes.txt file:
note
note1
note2
note3
and lastly here is what I am getting in console when I attempt to run the program and type notes(0,2)
>>> notes(0,2)
jeffv : notes(0,2)
[]
A great way to do this is to use the python .partition() method. It works by splitting a string from the first occurrence and returns a tuple... The tuple consists of three parts 0: Before the separator 1: The separator itself 2: After the separator:
# The whole string we wish to search.. Let's use a
# Monty Python quote since we are using Python :)
whole_string = "We interrupt this program to annoy you and make things\
generally more irritating."
# Here is the first word we wish to split from the entire string
first_split = 'program'
# now we use partition to pick what comes after the first split word
substring_split = whole_string.partition(first_split)[2]
# now we use python to give us the first character after that first split word
first_character = str(substring_split)[0]
# since the above is a space, let's also show the second character so
# that it is less confusing :)
second_character = str(substring_split)[1]
# Output
print("Here is the whole string we wish to split: " + whole_string)
print("Here is the first split word we want to find: " + first_split)
print("Now here is the first word that occurred after our split word: " + substring_split)
print("The first character after the substring split is: " + first_character)
print("The second character after the substring split is: " + second_character)
output
Here is the whole string we wish to split: We interrupt this program to annoy you and make things generally more irritating.
Here is the first split word we want to find: program
Now here is the first word that occurred after our split word: to annoy you and make things generally more irritating.
The first character after the substring split is:
The second character after the substring split is: t
I'm a novice in python programming and i'm trying to split full name to first name and last name, can someone assist me on this ? so my example file is:
Sarah Simpson
I expect the output like this : Sarah,Simpson
You can use the split() function like so:
fullname=" Sarah Simpson"
fullname.split()
which will give you: ['Sarah', 'Simpson']
Building on that, you can do:
first=fullname.split()[0]
last=fullname.split()[-1]
print(first + ',' + last)
which would give you Sarah,Simpson with no spaces
This comes handly : nameparser 1.0.6 - https://pypi.org/project/nameparser/
>>> from nameparser import HumanName
>>> name = "Sarah Simpson"
>>> name = HumanName(name)
>>> name.last
'Simpson'
>>> name.first
'Sarah'
>>> name.last+', '+name.first
'Simpson, Sarah'
you can try the .split() function which returns a list of strings after splitting by a separator. In this case the separator is a space char.
first remove leading and trailing spaces using .strip() then split by the separator.
first_name, last_name=fullname.strip().split()
Strings in Python are immutable. Create a new String to get the desired output.
You can use split() method of string class.
name = "Sarah Simpson"
name.split()
split() by default splits on whitespace, and takes separator as parameter. It returns a list
["Sarah", "Simpson"]
Just concatenate the strings. For more reference https://docs.python.org/3.7/library/stdtypes.html?highlight=split#str.split
Output = "Sarah", "Simpson"
name = "Thomas Winter"
LastName = name.split()[1]
(note the parantheses on the function call split.)
split() creates a list where each element is from your original string, delimited by whitespace. You can now grab the second element using name.split()[1] or the last element using name.split()[-1]
split() is obviously the function to go for-
which can take a parameter or 0 parameter
fullname="Sarah Simpson"
ls=fullname.split()
ls=fullname.split(" ") #this will split by specified space
Extra Optional
And if you want the split name to be shown as a string delimited by coma, then you can use join() or replace
print(",".join(ls)) #outputs Sarah,Simpson
print(st.replace(" ",","))
Input: Sarah Simpson => suppose it is a string.
Then, to output: Sarah, Simpson. Do the following:
name_surname = "Sarah Simpson".split(" ")
to_output = name_surname[0] + ", " + name_surname[-1]
print(to_output)
The function split is executed on a string to split it by a specified argument passed to it. Then it outputs a list of all chars or words that were split.
In your case: the string is "Sarah Simpson", so, when you execute split with the argument " " -empty space- the output will be: ["Sarah", "Simpson"].
Now, to combine the names or to access any of them, you can right the name of the list with a square brackets containing the index of the desired word to return. For example: name_surname[0] will output "Sarah" since its index is 0 in the list.
I need to find special characters from entire dataframe.
In below data frame some columns contains special characters, how to find the which columns contains special characters?
Want to display text for each columns if it contains special characters.
You can setup an alphabet of valid characters, for example
import string
alphabet = string.ascii_letters+string.punctuation
Which is
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?#[\\]^_`{|}~'
And just use
df.col.str.strip(alphabet).astype(bool).any()
For example,
df = pd.DataFrame({'col1':['abc', 'hello?'], 'col2': ['ÃÉG', 'Ç']})
col1 col2
0 abc ÃÉG
1 hello? Ç
Then, with the above alphabet,
df.col1.str.strip(alphabet).astype(bool).any()
False
df.col2.str.strip(alphabet).astype(bool).any()
True
The statement special characters can be very tricky, because it depends on your interpretation. For example, you might or might not consider # to be a special character. Also, some languages (such as Portuguese) may have chars like ã and é but others (such as English) will not.
To remove unwanted characters from dataframe columns, use regex:
def strip_character(dataCol):
r = re.compile(r'[^a-zA-Z !##$%&*_+-=|\:";<>,./()[\]{}\']')
return r.sub('', dataCol)
df[resultCol] = df[dataCol].apply(strip_character)
# Whitespaces also could be considered in some cases.
import string
unwanted = string.ascii_letters + string.punctuation + string.whitespace
print(unwanted)
# This helped me extract '10' from '10+ years'.
df.col = df.col.str.strip(unwanted)
I searched but found many things for removing space. I'm brand spanking new to python and trying to write a simple program that asks for first name, last name and then does the greeting. No matter how many spaces I put in between name + last on the print function line it keeps mashing the first and last name together.
name = input ("What is your first name?: ")
last = input ("what is your last name?: ")
print ('Nice to meet you,' name + last)
It outputs:
What is your first name?:Jessie
What is your last name?: Jackson
Nice to meet you, JessieJackson
What am I doing wrong?
There are several ways to get the wanted output:
Concentrating strings
If you want to concentrate your string you use the + operator.
It will concentrate your strings EXACTLY the way you provide them in your code.
Example:
>>> stringA = 'This is a'
>>> stringB = 'test'
>>> print(stringA + stringB)
'This is atest'
>>> print(stringA + ' ' + stringB)
'This is a test'
Printing on the same line
If you simply want to print multiple strings on the same line you can provide your strings to the print function as arguments seperated with a ,
Example:
>>> print('I want to say:', stringA, stringB)
I want to say: This is a test
Formatting strings
The most used way is string formatting. This can be done in two ways:
- Using the format function
- Using the 'old' way with %s
Example:
>>> print('Format {} example {}'.format(stringA, stringB))
Format This is a example test
>>> print('Old: %s example %s of string formatting' % (stringA, stringB))
Old: This is a example test of string formatting
Of course those examples can be combined in any way you want.
Example:
>>> stringC = 'normally'
>>> print((('%s strange {} no one ' % stringA) + stringC).format(stringB), 'uses')
This is a strange test no one normally uses
You can use + to append a string literal containing a space like this:
print ('Nice to meet you, ' + name + ' ' + last)
If you don't need to concatenate them together you could use:
print("Nice to meet you, " name, last)
outputting:
Nice to meet you, Jessie Jackson
This is because + concatenates strings but , prints them on the same line, but automatically spacing them because they are seperate entities.