How do you combine more than one pathlib object? - python-3.x

I've got two Path objects using Python's pathlib library, pathA = Path('/source/parent') and pathB = Path('/child/grandchild'). What's the most direct way to combine the two so that you get a Path('/source/parent/child/grandchild') object?

According to the docs:
You can do this easy by pathlib.PurePath(*pathsegments)
"Each element of pathsegments can be either a string representing a
path segment, an object implementing the os.PathLike interface which
returns a string, or another path object."
>>> PurePath('foo', 'some/path', 'bar')
PurePosixPath('foo/some/path/bar')
>>> PurePath(Path('foo'), Path('bar'))
PurePosixPath('foo/bar')
So for you it would be:
pathA = pathlib.Path('source/parent')
pathB = pathlib.Path('child/grandchild')
pathAB = pathlib.PurePath(pathA, pathB)
Output: source/parent/child/grandchild
Note
"When several absolute paths are given, the last is taken as an anchor
(mimicking os.path.join()’s behaviour):"
>>> PurePath('/etc', '/usr', 'lib64')
PurePosixPath('/usr/lib64')
>>> PureWindowsPath('c:/Windows', 'd:bar')
PureWindowsPath('d:bar')
Even when you do this:
pathA = pathlib.Path('/source/parent')
pathB = pathlib.Path('/child/grandchild')
pathAB = pathlib.PurePath(pathA, pathB)
Pathlib will handle pathB like a path object that is represented by a string.
Output: source/child/grandchild

pathA = Path('/source/parent')
pathB = Path('child/grandchild') * remove the first front slash
print(pathA / pathB)

Related

appending to a list inside a loop using a variable as the list name

my goal is to create several lists out of the contents of several files. In the past, I have used '{}'.format(x) inside of loops as a way to change the paths inside the loop to match whichever item in the list the loop is working on. Now I want to extend that to appending to lists outside the loop. Here is the code I am using currently.
import csv
import os
c3List = []
c4List = []
camList = []
plantList = ('c3', 'c4', 'cam')
for p in plantList:
plantFolder = folder path
plantCsv = '{}List.csv'.format(p)
plantPath = os.path.join(plantFolder, plantCsv)
with open(plantPath) as plantParse:
reader = csv.reader(plantParse)
data = list(reader)
'{}List'.format(p).append(data)
But this is giving me AttributeError: 'str' object has no attribute 'append'
if I try to make a variable like this
pList = '{}List'.format(p)
pList.append(data)
I get the same error. Any advice would be appreciated. I am using Python 3.
Because list object are mutable, you could create a dict referencing all of your lists.
For example with this:
myList = []
myDict = {"a": myList}
myDict["a"].append("appended_by_reference")
myList.append("appended_directly")
print(myList)
you will get ['appended_by_reference', 'appended_directly'] printed.
If you want to learn more about mutability and immutability in python see link.
So my own implementation to achieve your goal would be:
import csv
from pathlib import Path
c3List = []
c4List = []
camList = []
plantList = {'c3': c3List, 'c4': c4List, 'cam': camList}
plantFolder = `folder path`
for p in plantList:
plantCsv = f'{p}List.csv'
plantPath = Path(plantFolder, plantCsv)
with open(plantPath) as plantParse:
reader = csv.reader(plantParse)
data = list(reader)
plantList[p].append(data)
Note: I used an fstring to format the string and pathlib to define filepaths

How to switch the base of a path using pathlib?

I am trying to get a part of a path by removing the base, currently this is what I'm doing:
original = '/tmp/asd/asdqwe/file'
base = '/tmp/asd/'
wanted_part = original.strip(base)
Unfortunately, instead of getting 'asdqwe/file' I'm getting 'qwefile', for some reason strip works weird and I don't get it.
The best solution for my problem would be using pathlib.Path because my function gets its proprieties as paths, and the return value converting the trimmed string into Path after adding a new base path.
But if no pathlib solution is available a string one would also be great, currently I'm dealing with a weird bug...
You are misinterpreting how str.strip works. The method will remove all characters specified in the argument from the "edges" of the target string, regardless of the order in which they are specified:
original = '/tmp/asd/asdqwe/file'
base = '/tmp/asd/'
wanted_part = original.strip(base)
print(wanted_part)
# qwe/file
What you would like to do is probably a slicing:
wanted_part = original[len(base):]
print(wanted_part)
# asdqwe/file
Or, using pathlib:
from pathlib import Path
original = Path('/tmp/asd/asdqwe/file')
base = Path('/tmp/asd/')
wanted_part = original.relative_to(base)
print(wanted_part)
# asdqwe/file
strip will remove a sequnce of chars, not a string prefix or suffix, so it will keep removing anychars in the sequence you passed. Instaed you can test if the original starts with your base and if it does then just take the remaining chars of the string which are the chars after the length of the base.
original = '/tmp/asd/asdqwe/file'
base = '/tmp/asd/'
if original.startswith(base):
wanted_part = original[len(base):]
print(wanted_part)
OUTPUT
asdqwe/file

How to substitute predicate value by a variable using LXML find() with Python 3.6

I am new to Python coding. I am able to create the output XML file. I want to use a variable which holds a string value and pass it to 'predicate' of 'find()'. Is this achievable? How to make this work?
I am using LXML package with Python 3.6. Below is my code. Area of problem is commented at the end of the code.
import lxml.etree as ET
# Create root element
root = ET.Element("Base", attrib={'Name': 'My Base Node'})
# Create first child element
FirstElement = ET.SubElement(root, "FirstNode", attrib={'Name': 'My First Node', 'Comment':'Hello'})
# Create second child element
SecondElement = ET.SubElement(FirstElement, "SecondNode", attrib={'Name': 'My Second Node', 'Comment': 'World'})
# Create XML file
XML_data_as_string = ET.tostring(root, encoding='utf8')
with open("TestFile.xml", "wb") as f:
f.write(XML_data_as_string)
# Variable to substitute in second portion of predicate
NewValue = "My Second Node"
# #### AREA OF PROBLEM ###
# Question. How to pass variable 'NewValue' in the predicate?
# Gives "SyntaxError: invalid predicate"
x = root.find("./FirstNode/SecondNode[#Name={subs}]".format(subs=NewValue))
# I commented above line and reexecuted the code with this below line
# enabled. It gave "ValueError: empty namespace prefix must be passed as None,
# not the empty string"
x = root.find("./FirstNode/SecondNode[#Name=%s]", NewValue)
As Daniel Haley said - you're missing a single quotes in #Name={subs}.
The following line works for me:
x = root.find("./FirstNode/SecondNode[#Name='{subs}']".format(subs=NewValue))
Since you use Python 3.6, you can utilize f-strings:
x = root.find(f"./FirstNode/SecondNode[#Name='{NewValue}']")
The "proper" way to solve this would be to use XPath variables, which are not supported by find() (and consequently, aren't supported by xml.etree from the standard library either) but are supported by xpath().
NewValue = "AJL's Second Node" # Uh oh, that apostrophe is going to break something!!
x_list = root.xpath("./FirstNode/SecondNode[#Name=$subs]", subs=NewValue)
x = x_list[0]
This avoids any sort of issue you might otherwise run into with quoting and escaping.
The main caveat of this method is namespace support, since it doesn't use the bracket syntax of find.
x = root.find("./{foobar.xsd}FirstNode")
# Brackets are doubled to avoid conflicting with `.format()`
x = root.find("./{{foobar.xsd}}FirstNode/SecondNode[#Name='{subs}']".format(subs=NewValue))
Instead, you must specify those in a separate dict:
ns_list = {'hello':'foobar.xsd'}
x_list = root.xpath("./hello:FirstNode/SecondNode[#Name=$subs]", namespaces=ns_list , subs=NewValue)
x = x_list[0]

Python 3.6 pathlib Path change name parent directory

The new Path package from the pathlib library, which has been added from Python 3.4, seems a powerful replacement of approaches such as os.path.join(), but I've some trouble working with it.
I have a path that can be anything from
folder_foo/file.csv
to
long/path/to/folder_foo/file.csv
I read the .csv file in folder_foo with pandas, modify it and want to save it to
folder_bar/file.csv
or
long/path/to/folder_bar/file.csv
Essentially I want to rename folder_foo to folder_bar in the Path object.
EDIT: example path code
csv_path = Path("long/path/to/folder_foo/file.csv")
Attempts
1
csv_path.parents[0] = csv_path.parents[0] + "_clean")
Which leads to the error TypeError: unsupported operand type(s) for +: 'PosixPath' and 'str', which means you cannot use + to combine a PosixPath with a str as described in TypeError: unsupported operand type(s) for +: 'PosixPath' and 'str'.
2
To solve this I tried the following:
csv_path.parents[0] = Path(str(csv_path.parents[0]) + "_clean")
Which however results in the error : TypeError: '_PathParents' object does not support item assignment.
Since PosixPath is not a list, this error is understandable.
3
Maybe .parts is a better approach, but
csv_path.parts[-2] = csv_path.parts[-2][:-3] + "bar"
results in: TypeError: 'tuple' object does not support item assignment.
Question
How can I easily rename the file's parent folder?
Would rather split this up for readability:
bar_folder = csv_path.parent.parent / 'folder_bar'
csv_path2 = bar_folder / csv_path.name
Having the destination folder as a variable also enables you to create the folder using for example:
bar_folder.mkdir(exist_ok=True)
You could also write a little function to replace the part of the path you want to change. Here's a runnable example:
from pathlib import Path
path1 = Path("a/b/c.txt")
path2 = Path("b/c.txt")
def rename_dir(path, src, dst):
# convert to list so that we can change elements
parts = list(path.parts)
# replace part that matches src with dst
parts[parts.index(src)] = dst
return Path(*parts)
rename_dir(path1, 'b', 'q')
#> PosixPath('a/q/c.txt')
rename_dir(path2, 'b', 'q')
#> PosixPath('q/c.txt')
Created at 2021-03-06 10:44:00 PST by reprexlite v0.4.2
EDIT: Found a cleaner solution without str()
csv_path2 = csv_path.parents[1] / (csv_path.parts[-2][:-3] + "bar") / csv_path.parts[-1]
# result
PosixPath('long/path/to/folder_bar/file.csv')
Path.parents gets the whole path to the folder minus the file. Path.parents[1] goes 2 levels up (long/path/to/), which is still a Path object. Then we get the last folder name with csv_path.parts[-2], which is a string. We apply [:-3] to get all string characters except "foo". This means we have "folder_". Then with + "bar" we get "folder_bar", which is added to our Path object. Finally we re-add the file name to our Path object with / csv_path.parts[-1].
Hack like solution
csv_path = Path(str(csv_path.parents[0])[:-3] + 'bar/' + csv_path.parts[-1])
It seems to me a bit unintuitive, however. There should be a more clean solution?

String items in list: how to remove certain keywords?

I have a set of links that looks like the following:
links = ['http://www.website.com/category/subcategory/1',
'http://www.website.com/category/subcategory/2',
'http://www.website.com/category/subcategory/3',...]
I want to extract the 1, 2, 3, and so on from this list, and store the extracted data in subcategory_explicit. They're stored as str, and I'm having trouble getting at them with the following code:
subcategory_explicit = [cat.get('subcategory') for cat in links if cat.get('subcategory') is not None]
Do I have to change my data type from str to something else? What would be a better way to obtain and store the extracted values?
subcategory_explicit = [i[i.find('subcategory'):] for i in links if 'subcategory' in i]
This uses a substring via slicing, starting at the "s" in "subcategory" until the end of the string. By adding len('subcategory') to the value from find, you can exclude "subcategory" and get "/#" (where # is whatever number).
Try this (using re module):
import re
links = [
'http://www.website.com/category/subcategory/1',
'http://www.website.com/category/subcategory/2',
'http://www.website.com/category/subcategory/3']
d = "|".join(links)
# 'http://www.website.com/category/subcategory/1|http://www.website.com/category/subcategory/2|http://www.website.com/category/subcategory/3'
pattern = re.compile("/category/(?P<category_name>\w+)/\d+", re.I)
subcategory_explicit = pattern.findall(d)
print(subcategory_explicit)

Resources