How to get the list of perforce depots that have no changelist? - perforce

Is it possible to get the list of depots (name of the depot) that have no changelist created along with their creation date.

Run p4 changes against each depot, and print the name/time of each that has no results.
Here's a quick example using P4Python:
from datetime import datetime
from P4 import P4
with P4().connect() as p4:
for d in p4.run_depots():
depot = d['name']
if not p4.run_changes("-m1", f"//{depot}/..."):
print(depot, datetime.fromtimestamp(int(d['time'])))
When I run this script against my own local server it lists all the depots I've made that don't have any changelists in them:
Sprocket 2019-07-25 00:02:31
Widget 2019-07-24 23:45:04
repo 2020-04-28 09:53:13
spec 2022-02-08 08:23:23
compared to the full list of depots from p4 depots:
Depot Sprocket 2019/07/25 stream 1 Sprocket/... 'Created by Samwise. '
Depot Widget 2019/07/24 stream 1 Widget/... 'Created by Samwise. '
Depot collaborators 2020/07/12 stream 1 collaborators/... 'Created by Samwise. '
Depot depot 2019/09/22 local depot/... 'Created by Samwise. '
Depot repo 2020/04/28 local repo/... 'Created by Samwise. '
Depot spec 2022/02/08 spec .p4s spec/... 'Created by Samwise. '
Depot stream 2017/11/02 stream stream/... ''
Note that the time on the depot is the modification time; the depot spec doesn't maintain the original creation time. However, it's likely that if no changelists have ever been submitted into a depot, the depot spec itself hasn't been modified since its creation either.

Related

How to get change list details for provided users with specified date?

I need to collect Perforce Change List details for provided users and with provided date - (ie... from 01-06-22 to 25-12-22) in csv format.
Expected output in csv per Change List:-
S.No
Perforce Username
Change List
Submitted Date
Workspace
Description
1
dary.spitzer
123456
02-08-2022
Daryl_Spitzer_Workspace
Fixed alarm issues
2
shadkam.san
78910
24-12-2022
Shadkam_san_Workspace
PostgreSQL Support added
Thank you very much for reading and any help is much appreciated.
Use p4 changes with the -u flag to specify the user and the dates given as a revision range argument:
C:\Perforce\workshop>p4 changes -u samwise #2009/01/01,2010/01/01
Change 7479 on 2009/11/12 by samwise#samwise-silver 'Fix typo in last change. '
Change 7113 on 2009/01/22 by samwise#samwise-silver 'Make VSStoP4 html page a redire'
To reformat this into something resembling a CSV at the CLI you could use the -F flag:
p4 -Ztag -F %user%,%change%,%client%,%desc% changes -u samwise #2009/01/01,2010/01/01
samwise,7479,samwise-silver,Fix typo in last change.
samwise,7113,samwise-silver,Make VSStoP4 html page a redire
I would personally use Python rather than Bash to finish massaging this into the desired form though:
import csv
from datetime import datetime
import sys
from P4 import P4
out = csv.writer(sys.stdout)
out.writerow([
"S.No",
"Perforce Username",
"Change List",
"Submitted Date",
"Workspace",
"Description"
])
my_users = {'samwise'}
with P4().connect() as p4:
for i, change in enumerate(p4.run_changes('#2009/01/01,2010/01/01'), 1):
if change['user'] not in my_users:
# This makes it easy to handle multiple users
continue
out.writerow([
i,
change['user'],
change['change'],
datetime.fromtimestamp(int(change['time'])).date(),
change['client'],
change['desc'].strip()
])
produces:
S.No,Perforce Username,Change List,Submitted Date,Workspace,Description
1,samwise,7479,2009-11-12,samwise-silver,Fix typo in last change.
2,samwise,7113,2009-01-22,samwise-silver,Make VSStoP4 html page a redire

Create Folder Based on File Name in Azure Data Factory

I have a requirement to copy few files from an ADLS Gen1 location to another ADLS Gen1 location, but have to create folder based on file name.
I am having few files as below in the source ADLS:
ABCD_20200914_AB01_Part01.csv.gz
ABCD_20200914_AB02_Part01.csv.gz
ABCD_20200914_AB03_Part01.csv.gz
ABCD_20200914_AB03_Part01.json.gz
ABCD_20200914_AB04_Part01.json.gz
ABCD_20200914_AB04_Part01.csv.gz
Scenario-1
I have to copy these files into destination ADLS as below with only csv file and create folder from file name (If folder exists, copy to that folder) :
AB01-
|-ABCD_20200914_AB01_Part01.csv.gz
AB02-
|-ABCD_20200914_AB02_Part01.csv.gz
AB03-
|-ABCD_20200914_AB03_Part01.csv.gz
AB04-
|-ABCD_20200914_AB04_Part01.csv.gz
Scenario-2
I have to copy these files into destination ADLS as below with only csv and json files and create folder from file name (If folder exists, copy to that folder):
AB01-
|-ABCD_20200914_AB01_Part01.csv.gz
AB02-
|-ABCD_20200914_AB02_Part01.csv.gz
AB03-
|-ABCD_20200914_AB03_Part01.csv.gz
|-ABCD_20200914_AB03_Part01.json.gz
AB04-
|-ABCD_20200914_AB04_Part01.csv.gz
|-ABCD_20200914_AB04_Part01.json.gz
Is there any way to achieve this in Data Factory?
Appreciate any leads!
So I am not sure if this will entirely help, but I had a similar situation where we have 1 zip file and I had to copy those files out into their own folders.
So what you can do is use parameters in the datasink that you would be using, plus a variable activity where you would do a substring.
The job below is more for the delta job, but I think has enough stuff in it to hopefully help. My job can be divided into 3 sections.
The first Orange section gets the latest file name date from ADLS gen 1 folder that you want to copy.
It is then moved to the orange block. On the bottom I get the latest file name based on the ADLS gen 1 date and then I do a sub-string where I take out the date portion of the file. In your case you might be able to do an array and capture all of the folder names that you need.
Getting file name
Getting Substring
On the top section I get first extract and unzip that file into a test landing zone.
Source
Sink
I then get the names of all the files that were in that zip file to them be used in the ForEach Activity. These file names will then become folders for the copy activity.
Get File names from initial landing zone:
I then pass on those childitems from "Get list of staged files" into ForEach:
In that ForEach activity I have one copy activity. For that I made to datasets. One to grab the files from the initial landing zone that we have created. For this example lets call it Staging (forgive the ms paint drawing):
The purpose of this is to go to that dummy folder and grab each file that was just copied into there. From that 1 zip file we expect 5 files.
In the Sink section what I did is create a new dataset with a parameter for folder and file name. In that dataset I have am putting that data into same container, but created a new folder called "Stage" and concatenated it with the item name. I also added a "replace" command to remove the ".txt" from the file name.
What this will do then is what ever the file name that is coming from that dummy staging it will then have a folder name specifically for each file. Based on your requirements I am not sure if that is what you want to do, but you can always rework that to be more specific.
For Item name I basically get the same file name, then replace the ".txt", concat the name of the date value, and only after that add the ".txt" extension. Otherwise I would have had to ".txt" in the file name.
In the end I have created a delete activity that will then be used to delete all the files (I am not sure if have set that up properly so feel free to adjust obviously).
Hopefully the description above gave you an idea on how to use parameters for your files. Let me know if this helps you in your situation.

Get latest AWS S3 folder when both folder and files inside folder created at same time boto3

im trying to get latest folder in a given s3 prefix using below code
For ex:
s3a://mybucket/data/timestamp=20180612165132/part1.parquete
s3a://mybucket/data/timestamp=20180612165132/part2.parquete
s3a://mybucket/data/timestamp=20180613165132/part1.parquete
s3a://mybucket/data/timestamp=20180614165132/part1.parquete
s3a://mybucket/data/timestamp=20180615165132/part1.parquete
I need to find the latest timestamp folder under data folder..
keys = []
oldest = None
kwargs = {'Bucket': bucket_name, 'Prefix': key}
while True:
resp = get_conn().list_objects_v2(**kwargs)
for obj in resp['Contents']:
keys.append({'Key': obj['Key'], 'LastModified': obj['LastModified']})
try:
kwargs['ContinuationToken'] = resp['NextContinuationToken']
except KeyError:
break
logger.info("Got {0} keys".format(len(keys)))
for key in keys:
oldest = key['LastModified'] if oldest is None or key['LastModified'] < oldest else oldest
return oldest
The issue is i have 100 of files under each timestamp folder ,in the above im getting timestamp of each file created finding the oldest file under each timestamp folder to know the timestamp folder creation date
Im using this code as s3 treats this whole thing as 1 object
s3a://mybucket/data/timestamp=20180612165132/part1.parquete
there is no way im able to get the LastModifiedDate of the timestamp folder
And this is very expensive i feel as there can be hundreds of timestamp folders and each folder has 100 of files..
Is there any best way to achieve this?
As Josh says in the comments: there are no directories, so no directory timestamp.
The tools just make them up, such as in S3AFileStatus.
Some ideas
if the "folders" have their timestamp in the name, do a list of the parent path with suffix "/" ^ look for the entry whose timestamp is highest.
have each query write some index file in the base dir, containing a string of their directory. load that, you get the name of the latest file. Later jobs will overwrite. Warning: S3 overwrite consistency means you may get the older version, at least for a brief period (seconds, tens of seconds, at worst (usually))
option #2 would probably be fastest

Perforce Streams - Importing a stream which imports other streams

When importing a stream, is there a way to have the files from the imported stream's imports pulled into the workspace?
For example:
StreamA
StreamB imports StreamA
StreamC imports StreamB
I would like to know if there is a way for a workspace of StreamC to have the files from StreamC, StreamB and StreamA. From my testing, Perforce will only populate a StreamC workspace with files from StreamC and StreamB. If this is not possible or intentionally not allowed, what is the rationale? Thanks!
It's not possible because an import operates at the depot path level, rather than at the stream level. So if you have:
import //depot/streamB/...
you're not importing all of the files mapped by streamB, you're only mapping the files in the named depot path.
There is not presently a way to refer to the files mapped by a stream as a unit -- mostly people "fake it" by using the depot path, but as you've discovered, if the stream uses anything other than the default share ... Path definition, they aren't really the same thing.

Will spark wholetextfiles pick partially created file?

I am using Spark wholeTextFiles API to read the files from source folder and load it to hive table.
File are arriving at source folder from a remote server. File are of huge size like 1GB-3GB. SCP of the files is taking quite a while.
If i launch the spark job and file is being SCPd to the source folder and process is halfway, will spark pick the file?
If spark pick the file when it is halfway, it would be a problem since it would ignore rest of the content of the file.
Possible way to resolve:
At end of each file copy, SCP ZERO-kb file to indicate that SCP complete.
In spark job, when you do sc.wholeTextFiles(...), pick only those file names that has zero kb corresponding file - using map.
So, Here's code to check if correspondidng .ctl files are present in src folder.
val fr = sc.wholeTextFiles("D:\\DATA\\TEST\\tempstatus")
// Get only .ctl file
val temp1 = fr.map(x => x._1).filter(x => x.endsWith(".ctl"))
// Identify corresponding REAL-FILEs - without .ctl suffix
val temp2 = temp1.map(x => (x.replace(".ctl", ""),x.replace(".ctl", "")))
val result = fr
.join(xx)
.map{
case (_, (entry, x)) => (x, entry)
}
... Process rdd result as required.
The rdd temp2 is changed from RDD[String] to RDD[String, String] - for JOIN operation. Never mind.
If you are SCPing the files in to the source folder; and then spark is reading from that folder; it might happen that, half-written files are picked by spark, as SCP might take some time to copy.
That will happen for sure.
Your task would be - how not to write directly in that source folder - so that Spark doesn't pick incomplete files.
Possible way to resolve:
At end of each file copy, SCP ZERO-kb file to indicate that SCP complete.
In spark job, when you do sc.wholeTextFiles(...), pick only those file names that has zero kb corresponding file - using map.

Resources