read_csv naming the resulted dataframes - python-3.x

I want to read some CSVs from a given directory and I want to name the resulted dataframes similarly to the name of the csv.
So I wrote the code bellow but I am aware that it is not the rigth syntax.
Beside I have the error :
TypeError: 'str' object does not support item assignment
My code :
import os
for element in os.listdir('.'):
element[:-4] = read_csv(element)
Thank you for your help

you can do that by add tempering the global scope as follows:
import os
for i in os.listdir('.'):
globals()[i] = pd.read_csv(i)
But, that's very ugly and, as #JonClements pointed out, won't work if the filename doesn't follow the python variable naming rules. As a reminder, variable naming rules are :
Variables names must start with a letter or an underscore, such as:
_underscore
underscore_
The remainder of your variable name may consist of letters, numbers and underscores.
password1
n00b
un_der_scores
check this link for more explanation.
The best way is to create a dictionary:
import os
d = {}
for i in os.listdir('.'):
d[i] = pd.read_csv(i)
Then you can access any dataframe you want as follows d['file1.csv']

Related

Automating The Boring Stuff With Python - Chapter 8 - Exercise - Regex Search

I'm trying to complete the exercise for Chapter 8 using which takes a user supplied regular expression and uses it to search each string in each text file in a folder.
I keep getting the error:
AttributeError: 'NoneType' object has no attribute 'group'
The code is here:
import os, glob, re
os.chdir("C:\Automating The Boring Stuff With Python\Chapter 8 - \
Reading and Writing Files\Practice Projects\RegexSearchTextFiles")
userRegex = re.compile(input('Enter your Regex expression :'))
for textFile in glob.glob("*.txt"):
currentFile = open(textFile) #open the text file and assign it to a file object
textCurrentFile = currentFile.read() #read the contents of the text file and assign to a variable
print(textCurrentFile)
#print(type(textCurrentFile))
searchedText = userRegex.search(textCurrentFile)
searchedText.group()
When I try this individually in the IDLE shell it works:
textCurrentFile = "What is life like for those left behind when the last foreign troops flew out of Afghanistan? Four people from cities and provinces around the country told the BBC they had lost basic freedoms and were struggling to survive."
>>> userRegex = re.compile(input('Enter the your Regex expression :'))
Enter the your Regex expression :troops
>>> searchedText = userRegex.search(textCurrentFile)
>>> searchedText.group()
'troops'
But I can't seem to make it work in the code when I run it. I'm really confused.
Thanks
Since you are just looping across all .txt files, there could be files that doesn't have the word "troops" in it. To prove this, don't call the .group(), just perform:
print(textFile, textCurrentFile, searchedText)
If you see that searchedText is None, then that means the contents of textFile (which is textCurrentFile) doesn't have the word "troops".
You could either:
Add the word troops in all .txt files.
Only select the target .txt files, not all.
Check first if if the match is found before accessing .group()
print(searchedText.group() if searchedText else None)

F string is adding new line

I am trying to make a name generator. I am using F string to concatenate the first and the last names. But instead of getting them together, I am getting them in a new line.
print(f"Random Name Generated is:\n{random.choice(firstname_list)}{random.choice(surname_list)}")
This give the output as:
Random Name Generated is:
Yung
heady
Instead of:
Random Name Generated is:
Yung heady
Can someone please explain why so?
The code seems right, perhaps could be of newlines (\n) characters in element of list.
Check the strings of lists.
import random
if __name__ == '__main__':
firstname_list = ["yung1", "yung2", "yung3"]
surname_list = ["heady1", "heady2", "heady3"]
firstname_list = [name.replace('\n', '') for name in firstname_list]
print(f"Random Name Generated is:\n{random.choice(firstname_list)} {random.choice(surname_list)}")
Output:
Random Name Generated is:
yung3 heady2
Since I had pulled these values from UTF-8 encoded .txt file, the readlines() did convert the names to list elements but they had a hidden '\xa0\n' in it.
This caused this particular printing problem. Using .strip() helped to remove the spaces.
print(f"Random Name Generated is:\n{random.choice(firstname_list).strip()} {random.choice(surname_list).strip()}")

python iterating on multiple files

I have
file_2000.dta, file_2001.dta, file_2002.dta and so on.
I also have
file1_2000.dta, file1_2001.dta, file1_2002.dta and so on.
I want to iterate on the file year.
Let (year) = 2000, 2001, 2002, etc
import file_(year) using pandas.
import file1_(year) using pandas.
file_(year)['name'] = file_(year).index
file1_(year)['name'] = file1_(year).index2
merged = pd.merge(file_(year), file1_(year), on='name')
write/export merged_(year).dta
It seems to me that you need to use the read_stata function, based on your .dta extensions, to read the files in a loop, create a list of the separate dataframes to be able to work with them separately, and then concatenate all dataframes into one.
Something like:
list_of_files = ['file_2000.dta', 'file_2001.dta', 'file_2002.dta'] # full paths here...
frames = []
for f in list_of_files:
df = pd.read_stata(f)
frames.append(df)
consolidated_df = pd.concat(frames, axis=0, ignore_index=True)
These questions might be relevant to your case:
How to Read multiple files in Python for Pandas separate dataframes
Pandas read_stata() with large .dta files
As much as I know there is not 'Let' keyword in Python. To iterate over multiple files in a directory you can simply use for loop with os module like the following:
import os
directory = r'C:\Users\admin'
for filename in os.listdir(directory):
if filename.startswith("file_200") and filename.endswith(".dat"):
# do something
else:
continue
Another approach is to use regex to tell python the files names to match during the iteration. the pattern should be: pattern = r"file_20\d+"

Python 3.6 pathlib Path change name parent directory

The new Path package from the pathlib library, which has been added from Python 3.4, seems a powerful replacement of approaches such as os.path.join(), but I've some trouble working with it.
I have a path that can be anything from
folder_foo/file.csv
to
long/path/to/folder_foo/file.csv
I read the .csv file in folder_foo with pandas, modify it and want to save it to
folder_bar/file.csv
or
long/path/to/folder_bar/file.csv
Essentially I want to rename folder_foo to folder_bar in the Path object.
EDIT: example path code
csv_path = Path("long/path/to/folder_foo/file.csv")
Attempts
1
csv_path.parents[0] = csv_path.parents[0] + "_clean")
Which leads to the error TypeError: unsupported operand type(s) for +: 'PosixPath' and 'str', which means you cannot use + to combine a PosixPath with a str as described in TypeError: unsupported operand type(s) for +: 'PosixPath' and 'str'.
2
To solve this I tried the following:
csv_path.parents[0] = Path(str(csv_path.parents[0]) + "_clean")
Which however results in the error : TypeError: '_PathParents' object does not support item assignment.
Since PosixPath is not a list, this error is understandable.
3
Maybe .parts is a better approach, but
csv_path.parts[-2] = csv_path.parts[-2][:-3] + "bar"
results in: TypeError: 'tuple' object does not support item assignment.
Question
How can I easily rename the file's parent folder?
Would rather split this up for readability:
bar_folder = csv_path.parent.parent / 'folder_bar'
csv_path2 = bar_folder / csv_path.name
Having the destination folder as a variable also enables you to create the folder using for example:
bar_folder.mkdir(exist_ok=True)
You could also write a little function to replace the part of the path you want to change. Here's a runnable example:
from pathlib import Path
path1 = Path("a/b/c.txt")
path2 = Path("b/c.txt")
def rename_dir(path, src, dst):
# convert to list so that we can change elements
parts = list(path.parts)
# replace part that matches src with dst
parts[parts.index(src)] = dst
return Path(*parts)
rename_dir(path1, 'b', 'q')
#> PosixPath('a/q/c.txt')
rename_dir(path2, 'b', 'q')
#> PosixPath('q/c.txt')
Created at 2021-03-06 10:44:00 PST by reprexlite v0.4.2
EDIT: Found a cleaner solution without str()
csv_path2 = csv_path.parents[1] / (csv_path.parts[-2][:-3] + "bar") / csv_path.parts[-1]
# result
PosixPath('long/path/to/folder_bar/file.csv')
Path.parents gets the whole path to the folder minus the file. Path.parents[1] goes 2 levels up (long/path/to/), which is still a Path object. Then we get the last folder name with csv_path.parts[-2], which is a string. We apply [:-3] to get all string characters except "foo". This means we have "folder_". Then with + "bar" we get "folder_bar", which is added to our Path object. Finally we re-add the file name to our Path object with / csv_path.parts[-1].
Hack like solution
csv_path = Path(str(csv_path.parents[0])[:-3] + 'bar/' + csv_path.parts[-1])
It seems to me a bit unintuitive, however. There should be a more clean solution?

Why is str.translate() returning an error and how can I fix it?

import os
def rename_files():
file_list = os.listdir(r"D:\360Downloads\test")
saved_path = os.getcwd()
os.chdir(r"D:\360Downloads\test")
for file_name in file_list:
os.rename(file_name, file_name.translate(None,"0123456789"))
rename_files()
the error message is TypeError: translate() takes exactly one argument (2 given). How can I format this so that translate() does not return an error?
Hope this helps!
os.rename(file_name,file_name.translate(str.maketrans('','','0123456789')))
or
os.rename(file_name,file_name.translate({ ord(i) : None for i in '0123456789' }))
Explanation:
I think you're using Python 3.x and syntax for Python 2.x. In Python 3.x translate() syntax is
str.translate(table)
which takes only one argument, not like Python 2.x in which translate() syntax is
str.translate(table[, deletechars])
which can takes more than one arguments.
We can make translation table easily using maketrans function.
In this case, In first two parameters, we're replacing nothing to nothing and in third parameter we're specifying which characters to be removed.
We can also make translation table manually using dictionary in which key contains ASCII of before and value contains ASCII of after character.If we want to remove some character it value must be None.
I.e. if we want to replace 'A' with 'a' and remove '1' in string then our dictionary looks like this
{65: 97, 49: None}

Resources