Get a top level from Path object of pathlib

Get a top level from Path object of pathlib - python-3.x

I use pathlib to match all files recursively to filter the files based on their content. Then I would like to find what is the top level of the folder of this file. Assume the following. I have a file in the folder:
a/b/c/file.log
I do the search from the level a:
for f in path_data.glob("**/*"):
if something inside file f:
# I would like to get in what folder this file is, i.e. 'b'
I now that I can get all parents levels using:
f.parents would give me b/c
f.parent would give me c
f.name would give me file.log
But how could I get b?
Just to precise: the number of levels where the file is stored is not known.
UPD: I know I could do it with split, but I would like to know if there is a proper API to do that. I couldn't find it.

The question was asked a while ago, but didn't quite get the attention. Nevertheless, I still would publish the answer:
f.parts[0]

Related

How to read multiple CSV (leaving out specific ones) from a nested directory in PySpark?

Lets say I have a directory called 'all_data', and inside this, I have several other directories based on the date of the data that it contains. These directories are named date_2020_11_01 to date_2020_11_30 and each one of these contain csv files which I intend to read in a single dataframe.
But I don't want to read the data for date_2020_11_15 and date_2020_11_16. How do I do it?

I'm not sure how to exclude certain files, but you can specify a range of file names using brackets. Code below would select all files without 11_15 and 11_16:
spark.read.csv("date_2020_11_{1[0-4,7-9],[0,2-3][0-9]}.csv")

df= spark.read.format("parquet").option("header", "true").load(paths)
where paths is a list of all the paths where data is present, worked for me.

Simple method is, read all data directory as it is and apply filter condition
df.filter("dataColumn != 'date_2020_11_15' & 'date_2020_11_16'")
Else you can use OS module read directory and iterate to that list to eliminate those date directory using condition.

Attempting to append all content into file, last iteration is the only one filling text document

I'm trying to Create a file and append all the content being calculated into that file, but when I run the script the very last iteration is written inside the file and nothing else.
My code is on pastebin, it's too long, and I feel like you would have to see exactly how the iteration is happening.
Try to summarize it, Go through an array of model numbers, if the model number matches call the function that calculates that MAC_ADDRESS, when done calculating store all the content inside a the file.
I have tried two possible routes and both have failed, giving the same result. There is no error in the code (it runs) but it just doesn't store the content into the file properly there should be 97 different APs and it's storing only 1.
The difference between the first and second attempt,
1 attempt) I open/create file in the beginning of the script and close at the very end.
2 attempt) I open/create file and close per-iteration.
First Attempt:
https://pastebin.com/jCpLGMCK
#Beginning of code
File = open("All_Possibilities.txt", "a+")
#End of code
File.close()
Second Attempt:
https://pastebin.com/cVrXQaAT
#Per function
File = open("All_Possibilities.txt", "a+")
#per function
File.close()
If I'm not suppose to reference other websites, please let me know and I'll just paste the code in his post.

Rather than close(), please use with:
with open('All_Possibilities.txt', 'a') as file_out:
file_out.write('some text\n')
The documentation explains that you don't need + to append writes to a file.
You may want to add some debugging console print() statements, or use a debugger like pdb, to verify that the write() statement actually ran, and that the variable you were writing actually contained the text you thought it did.
You have several loops that could be a one-liner using readlines().
Please do this:
$ pip install flake8
$ flake8 *.py
That is, please run the flake8 lint utility against your source code,
and follow the advice that it offers you.
In particular, it would be much better to name your identifier file than to name it File.
The initial capital letter means something to humans reading your code -- it is
used when naming classes, rather than local variables. Good luck!

Move non-sequential files to new directory

I have no previous programming experience. I know this question has been asked before or the answer is out there but I, for the life of me, cannot find it. I have searched google for hours trying to figure this out. I am working on a Red Hat Linux computer and it is in bash.
I have a directory of files 0-500 in /directory/.
They are named as such,
/directory/filename_001, /directory/filename_002, and so forth.
After running my analysis for my research, I have a listofnumbers.txt (txt file, with each row being a new number) of the numbers that I am interested in. For example,
015
124
187
345
412
A) Run a command from the list of files the files from the list of numbers? Our code looks like this:
g09slurm filename_001.com filename_001.log
Is there a way to write something like:
find value (row1 of listofnumbers.txt) then g09slurm filename_row1value.com filename_row1value.log
find value (row2 of listofnumbers.txt) then g09slurm filename_row2value.com filename_row2value.log
find value (row3 of listofnumbers.txt) then g09slurm filename_row3value.com filename_row2value.log
etc etc
B) Move the selected files from the list to a new directory, so I can rename them sequentially, then run a sequential number command?
Thanks.

First, read the list of files into an array:
readarray myarray < /path/to/filename.txt
Next, we'll get all the filenames based on those numbers, and move them
cd /path/to/directory
mv -t /path/to/new_directory "${myarray[#]/#/filename_}"
After this... honestly, I got bored. Stack Overflow is about helping people who make a good start at a problem, and you've done zero work toward figuring this out (other than writing "I promise I tried google").
I don't even understand what
Run a command from the list of files the files from the list of numbers
means.
To rename them sequentially (once you've moved them), you'll want to do something based on this code:
for i in $(ls); do
*your stuff here*
done
You should be able to research and figure stuff out. You might have to do some bash tutorials, here's a reasonable starting place

How to change many .sra documents into one fastq document?

I download a series of .sra which belong to one sample from NCBI. I tried to change one sra into fastq, but it is error.
My code:
$fastq-dump I --split-files ERRXXXXX.sra.
And My .sra document is paired.
I used $fastq-dump SRR5XXXXX.sra to change another process, and it worked well.
Therefore I would like to know how to make many .sra into one .fastq document? Thank you for your kindness.

I don't really understand the whole message, but regarding your specific question "how to make many .sra into one .fastq document", the answer is pretty simple:
Generate multiple fastq files from all the sra files you are interested in in the usual way.
Concatenate all those fastq files in a single one: cat fastq1.fq fastq2.fq ... fastqN.fq > new_fastq.fq
Remove intermediate files if no longer needed
The new_fastq.fq file contains all the information from the original sra files.
Take care and don't mix first and second ends in the same fastq (unless you know what you are doing, of course).

Searching in multiple files using findstr, only proceeding with the resulting files? (cmd)

I'm currently working on a project where I search hundreds of files using findstr in the command line. If I find the string which I searched for, I want to proceed with this exact file (and the other ones that include my string).
So in my case:
I searched for the string WRI2016 by using:
H:\KOBINI>findstr "WRI2016" *.ini > %temp%\xx.txt && %temp%\xx.txt
To see what the PC does, I save it in a .txt file as you can see.
So if my file includes WRI2016 I want to extract some facts out of the file. In my case it is NR, Kunde, WebHDAktiv, DigIDAktiv.
But I just can't find a proper way to link both of these functions.
At first I simply printed all of the parameters:
H:\KOBINI>findstr "\<NR Kunde WRI2016 WebHDAktiv DigIDAktiv" *.ini > %temp%\xx.csv && %temp%\xx.csv
I also played around using the if command but that didn't really work out. I'm pretty new to this stuff as you'll see in my following tries to solve this problem:
H:\KOBINI>findstr "\<NR DigIDAktiv WebHDAktiv" set a =*.ini findstr "WRI2016" set b =*.ini if a EQU b > %temp%\xx.txt && %temp%\xx.txt
So all I wanted to achieve with that weird code was: if there is a WRI2016 in the file, give me the remaining parameters. But that didn't work out at all.
I also tried it with using new lines for every command which didn't change a thing.
As I want this to be a .csv in the end I want to add a semicolon between my parameters, any chance how I could do that? I've seen versions using -s";" which didn't do anything for me.

Sorry, I'm quite new and thought I'd give it a shot.
an example of my .ini files Looks like this:
> Kunde=Markt
> Nr=101381
> [...]
> DigIDAktiv=Ja
> WebHDAktiv=Nein
> Version=WRI2016_U2_P1
some files have a different Version though.
So I only want to know "NR, DigIDAktiv ..." if it's the 2016 Version.
As a result it should be sorted in a CSV, in different columns.
My Folder Looks like this
So I search These files in order to find Version 2016 and then try to extract my Information and put it into a .csv

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Get a top level from Path object of pathlib - python-3.x

The question was asked a while ago, but didn't quite get the attention. Nevertheless, I still would publish the answer: f.parts[0]

Related

How to read multiple CSV (leaving out specific ones) from a nested directory in PySpark?

Attempting to append all content into file, last iteration is the only one filling text document

Move non-sequential files to new directory

How to change many .sra documents into one fastq document?

Searching in multiple files using findstr, only proceeding with the resulting files? (cmd)

Categories

Resources