Creating nested dictionaries from a list containing paths - python-3.x

I have a list containing paths. For example:
links=['main',
'main/path1',
'main/path1/path2',
'main/path1/path2/path3/path4',
'main/path1/path2/path3/path5',
'main/path1/path2/path3/path4/path6']
I want to create a nested dictionary to store these paths in order. Expected output:
Output = {‘main’: {‘path1’: {‘path2’: {‘path3’: {‘path4’: {‘path6’: {} }},‘path5’:{}}}}}
I am new to python coding (v 3.+) and I am unable to solve it. It gets confusing after i reach path 3 as there is path 4 (with path6 nested) and path5 as well. Can someone please help ?

Something like
tree = {}
for path in links: # for each path
node = tree # start from the very top
for level in path.split('/'): # split the path into a list
if level: # if a name is non-empty
node = node.setdefault(level, dict())
# move to the deeper level
# (or create it if unexistent)
With links defined as above, it results in
>>> tree
{'main': {'path1': {'path2': {'path3': {'path4': {'path6': {}}, 'path5': {}}}}}}

Related

How may I dynamically create global variables within a function based on input in Python

I'm trying to create a function that returns a dynamically-named list of columns. Usually I can manually name the list, but I now have 100+ csv files to work with.
My goal:
Function creates a list, and names it based on dataframe name
Created list is callable outside of the function
I've done my research, and this answer from an earlier post came very close to helping me.
Here is what I've adapted
def test1(dataframe):
# Using globals() to get dataframe name
df_name = [x for x in globals() if globals()[x] is dataframe][0]
# Creating local dictionary to use exec function
local_dict = {}
# Trying to generate a name for the list, based on input dataframe name
name = 'col_list_' + df_name
exec(name + "=[]", globals(), local_dict)
# So I can call this list outside the function
name = local_dict[name]
for feature in dataframe.columns:
# Append feature/column if >90% of values are missing
if dataframe[feature].isnull().mean() >= 0.9:
name.append(feature)
return name
To ensure the list name changes based on the DataFrame supplied to the function, I named the list using:
name = 'col_list_' + df_name
The problem comes when I try to make this list accessible outside the function:
name = local_dict[name].
I cannot find away to assign a dynamic list name to the local dictionary, so I am forced to always call name outside the function to return the list. I want the list to be named based on the dataframe input (eg. col_list_df1, col_list_df2, col_list_df99).
This answer was very helpful, but it seems specific to variables.
global 'col_list_' + df_name returns a syntax error.
Any help would be greatly appreciated!

Defining One Dict Value Based on Another Dict Value (Undefined Value)

My program is a pipeline that processes files. I have a dict (P) which stores directory Path's. All of these directory Path's are are relative to a common ROOT Path from which they are generated. The dict works when I define ROOT outside of the dict as follows:
# WORKS
from pathlib import Path
ROOT = Path("/very/long/path/")
P = {
"ROOT": ROOT,
"FS_TO_IDX": ROOT / "docs/",
"IDXD_FS": ROOT / "indexed_docs/",
}
This seems inelegant. Since ROOT is already an element of dict, I would prefer to use the ROOT value in generating the remaining dict values. However, I get "Undefined variable:P" when I do the following.
# FAILS
from pathlib import Path
P = {
"ROOT": Path("/very/long/path/"),
"FS_TO_IDX": P["ROOT"] / "docs/",
"IDXD_FS": P["ROOT"] / "indexed_docs/",
}
Is there a similar approach that would allow me to assign a dict value and then use that same key/value to define other values in the dict? For example, the walrus operator (:=) seems to provide similar behavior by allowing one to assign to variables within an expression and then use that variable.

loop to read multiple files

I am using Obspy _read_segy function to read a segy file using following line of code:
line_1=_read_segy('st1.segy')
However I have a large number of files in a folder as follow:
st1.segy
st2.segy
st3.segy
.
.
st700.segy
I want to use a for loop to read the data but I am new so can any one help me in this regard.
Currently i am using repeated lines to read data as follow:
line_2=_read_segy('st1.segy')
line_2=_read_segy('st2.segy')
The next step is to display the segy data using matplotlib and again i am using following line of code on individual lines which makes it way to much repeated work. Can someone help me with creating a loop to display the data and save the figures .
data=np.stack(t.data for t in line_1.traces)
vm=np.percentile(data,99)
plt.figure(figsize=(60,30))
plt.imshow(data.T, cmap='seismic',vmin=-vm, vmax=vm, aspect='auto')
plt.title('Line_1')
plt.savefig('Line_1.png')
plt.show()
Your kind suggestions will help me a lot as I am a beginner in python programming.
Thank you
If you want to reduce code duplication, you use something called functions. And If you want to repeatedly do something, you can use loops. So you can call a function in a loop, if you want to do this for all files.
Now, for reading the files in folder, you can use glob package of python. Something like below:
import glob, os
def save_fig(in_file_name, out_file_name):
line_1 = _read_segy(in_file_name)
data = np.stack(t.data for t in line_1.traces)
vm = np.percentile(data, 99)
plt.figure(figsize=(60, 30))
plt.imshow(data.T, cmap='seismic', vmin=-vm, vmax=vm, aspect='auto')
plt.title(out_file_name)
plt.savefig(out_file_name)
segy_files = list(glob.glob(segy_files_path+"/*.segy"))
for index, file in enumerate(segy_files):
save_fig(file, "Line_{}.png".format(index + 1))
I have not added other imports here, which you know to add!. segy_files_path is the folder where your files reside.
You just need to dynamically open the files in a loop. Fortunately they all follow the same naming pattern.
N = 700
for n in range(N):
line_n =_read_segy(f"st{n}.segy") # Dynamic name.
data = np.stack(t.data for t in line_n.traces)
vm = np.percentile(data, 99)
plt.figure(figsize=(60, 30))
plt.imshow(data.T, cmap="seismic", vmin=-vm, vmax=vm, aspect="auto")
plt.title(f"Line_{n}")
plt.show()
plt.savefig(f"Line_{n}.png")
plt.close() # Needed if you don't want to keep 700 figures open.
I'll focus on addressing the file looping, as you said you're new and I'm assuming simple loops are something you'd like to learn about (the first example is sufficient for this).
If you'd like an answer to your second question, it might be worth providing some example data, the output result (graph) of your current attempt, and a description of your desired output. If you provide that reproducible example and clear description of the problem you're having it'd be easier to answer.
Create a list (or other iterable) to hold the file names to read, and another container (maybe a dict) to hold the result of your read_segy.
files = ['st1.segy', 'st2.segy']
lines = {} # creates an empty dictionary; dictionaries consist of key: value pairs
for f in files: # f will first be 'st1.segy', then 'st2.segy'
lines[f] = read_segy(f)
As stated in the comment by #Guimoute, if you want to dynamically generate the file names, you can create the files list by pasting integers to the base file name.
lines = {} # creates an empty dictionary; dictionaries have key: value pairs
missing_files = []
for i in range(1, 701):
f = f"st{str(i)}.segy" # would give "st1.segy" for i = 1
try: # in case one of the files is missing or can’t be read
lines[f] = read_segy(f)
except:
missing_files.append(f) # store names of missing or unreadable files

Convert everything in a dictionary to lower case, then filter on it?

import pandas as pd
import nltk
import os
directory = os.listdir(r"C:\...")
x = []
num = 0
for i in directory:
x.append(pd.read_fwf("C:\\..." + i))
x[num] = x[num].to_string()
So, once I have a dictionary x = [ ] populated by the read_fwf for each file in my directory:
I want to know how to make it so every single character is lowercase. I am having trouble understanding the syntax and how it is applied to a dictionary.
I want to define a filter that I can use to count for a list of words in this newly defined dictionary, e.g.,
list = [bus, car, train, aeroplane, tram, ...]
Edit: Quick unrelated question:
Is pd_read_fwf the best way to read .txt files? If not, what else could I use?
Any help is very much appreciated. Thanks
Edit 2: Sample data and output that I want:
Sample:
The Horncastle boar's head is an early seventh-century Anglo-Saxon
ornament depicting a boar that probably was once part of the crest of
a helmet. It was discovered in 2002 by a metal detectorist searching
in the town of Horncastle, Lincolnshire. It was reported as found
treasure and acquired for £15,000 by the City and County Museum, where
it is on permanent display.
Required output - changes everything in uppercase to lowercase:
the horncastle boar's head is an early seventh-century anglo-saxon
ornament depicting a boar that probably was once part of the crest of
a helmet. it was discovered in 2002 by a metal detectorist searching
in the town of horncastle, lincolnshire. it was reported as found
treasure and acquired for £15,000 by the city and county museum, where
it is on permanent display.
You shouldn't need to use pandas or dictionaries at all. Just use Python's built-in open() function:
# Open a file in read mode with a context manager
with open(r'C:\path\to\you\file.txt', 'r') as file:
# Read the file into a string
text = file.read()
# Use the string's lower() method to make everything lowercase
text = text.lower()
print(text)
# Split text by whitespace into list of words
word_list = text.split()
# Get the number of elements in the list (the word count)
word_count = len(word_list)
print(word_count)
If you want, you can do it in the reverse order:
# Open a file in read mode with a context manager
with open(r'C:\path\to\you\file.txt', 'r') as file:
# Read the file into a string
text = file.read()
# Split text by whitespace into list of words
word_list = text.split()
# Use list comprehension to create a new list with the lower() method applied to each word.
lowercase_word_list = [word.lower() for word in word_list]
print(word_list)
Using a context manager for this is good since it automatically closes the file for you as soon as it goes out of scope (de-tabbed from with statement block). Otherwise you would have to use file.open() and file.read().
I think there are some other benefits to using context managers, but someone please correct me if I'm wrong.
I think what you are looking for is dictionary comprehension:
# Python 3
new_dict = {key: val.lower() for key, val in old_dict.items()}
# Python 2
new_dict = {key: val.lower() for key, val in old_dict.iteritems()}
items()/iteritems() gives you a list of tuples of the (keys, values) represented in the dictionary (e.g. [('somekey', 'SomeValue'), ('somekey2', 'SomeValue2')])
The comprehension iterates over each of these pairs, creating a new dictionary in the process. In the key: val.lower() section, you can do whatever manipulation you want to create the new dictionary.

Abaqus Python script -- Reading 'TENSOR_3D_FULL' data from *.odb file

What I want: strain values LE11, LE22, LE12 at nodal points
My script is:
#!/usr/local/bin/python
# coding: latin-1
# making the ODB commands available to the script
from odbAccess import*
import sys
import csv
odbPath = "my *.odb path"
odb = openOdb(path=odbPath)
assembly = odb.rootAssembly
# count the number of frames
NumofFrames = 0
for v in odb.steps["Step-1"].frames:
NumofFrames = NumofFrames + 1
# create a variable that refers to the reference (undeformed) frame
refFrame = odb.steps["Step-1"].frames[0]
# create a variable that refers to the node set ‘Region Of Interest (ROI)’
ROINodeSet = odb.rootAssembly.nodeSets["ROI"]
# create a variable that refers to the reference coordinate ‘REFCOORD’
refCoordinates = refFrame.fieldOutputs["COORD"]
# create a variable that refers to the coordinates of the node
# set in the test frame of the step
ROIrefCoords = refCoordinates.getSubset(region=ROINodeSet,position= NODAL)
# count the number of nodes
NumofNodes =0
for v in ROIrefCoords.values:
NumofNodes = NumofNodes +1
# looping over all the frames in the step
for i1 in range(NumofFrames):
# create a variable that refers to the current frame
currFrame = odb.steps["Step-1"].frames[i1+1]
# looping over all the frames in the step
for i1 in range(NumofFrames):
# create a variable that refers to the strain 'LE'
Str = currFrame.fieldOutputs["LE"]
ROIStr = Str.getSubset(region=ROINodeSet, position= NODAL)
# initialize list
list = [[]]
# loop over all the nodes in each frame
for i2 in range(NumofNodes):
strain = ROIStr.values [i2]
list.insert(i2,[str(strain.dataDouble[0])+";"+str(strain.dataDouble[1])+\
";"+str(strain.dataDouble[3]))
# write the list in a new *.csv file (code not included for brevity)
odb.close()
The error I get is:
strain = ROIStr.values [i2]
IndexError: Sequence index out of range
Additional info:
Details for ROIStr:
ROIStr.name
'LE'
ROIStr.type
TENSOR_3D_FULL
OIStr.description
'Logarithmic strain components'
ROIStr.componentLabels
('LE11', 'LE22', 'LE33', 'LE12', 'LE13', 'LE23')
ROIStr.getattribute
'getattribute of openOdb(r'path to .odb').steps['Step-1'].frames[1].fieldOutputs['LE'].getSubset(position=INTEGRATION_POINT, region=openOdb(r'path to.odb').rootAssembly.nodeSets['ROI'])'
When I use the same code for VECTOR objects, like 'U' for nodal displacement or 'COORD' for nodal coordinates, everything works without a problem.
The error happens in the first loop. So, it is not the case where it cycles several loops before the error happens.
Question: Does anyone know what is causing the error in the above code?
Here the reason you get an IndexError. Strains are (obviously) calculated at the integration points; according to the ABQ Scripting Reference Guide:
A SymbolicConstant specifying the position of the output in the element. Possible values are:
NODAL, specifying the values calculated at the nodes.
INTEGRATION_POINT, specifying the values calculated at the integration points.
ELEMENT_NODAL, specifying the values obtained by extrapolating results calculated at the integration points.
CENTROID, specifying the value at the centroid obtained by extrapolating results calculated at the integration points.
In order to use your code, therefore, you should get the results using position= ELEMENT_NODAL
ROIrefCoords = refCoordinates.getSubset(region=ROINodeSet,position= ELEMENT_NODAL)
With
ROIStr.values[0].data
You will then get an array containing the 6 independent components of your tensor.
Alternative Solution
For reading time series of results for a nodeset, you can use the function xyPlot.xyDataListFromField(). I noticed that this function is much faster than using odbread. The code also is shorter, the only drawback is that you have to get an abaqus license for using it (in contrast to odbread which works with abaqus python which only needs an installed version of abaqus and does not need to get a network license).
For your application, you should do something like:
from abaqus import *
from abaqusConstants import *
from abaqusExceptions import *
import visualization
import xyPlot
import displayGroupOdbToolset as dgo
results = session.openOdb(your_file + '.odb')
# without this, you won't be able to extract the results
session.viewports['Viewport: 1'].setValues(displayedObject=results)
xyList = xyPlot.xyDataListFromField(odb=results, outputPosition=NODAL, variable=((
'LE', INTEGRATION_POINT, ((COMPONENT, 'LE11'), (COMPONENT, 'LE22'), (
COMPONENT, 'LE33'), (COMPONENT, 'LE12'), )), ), nodeSets=(
'ROI', ))
(Of course you have to add LE13 etc.)
You will get a list of xyData
type(xyList[0])
<type 'xyData'>
Containing the desired data for each node and each output. It size will therefore be
len(xyList)
number_of_nodes*number_of_requested_outputs
Where the first number_of_nodes elements of the list are the LE11 at each nodes, then LE22 and so on.
You can then transform this in a NumPy array:
LE11_1 = np.array(xyList[0])
would be LE11 at the first node, with dimensions:
LE.shape
(NumberTimeFrames, 2)
That is, for each time step you have time and output variable.
NumPy arrays are also very easy to write on text files (check out numpy.savetxt).

Resources