How to get change list details for provided users with specified date? - perforce

I need to collect Perforce Change List details for provided users and with provided date - (ie... from 01-06-22 to 25-12-22) in csv format.
Expected output in csv per Change List:-
S.No
Perforce Username
Change List
Submitted Date
Workspace
Description
1
dary.spitzer
123456
02-08-2022
Daryl_Spitzer_Workspace
Fixed alarm issues
2
shadkam.san
78910
24-12-2022
Shadkam_san_Workspace
PostgreSQL Support added
Thank you very much for reading and any help is much appreciated.

Use p4 changes with the -u flag to specify the user and the dates given as a revision range argument:
C:\Perforce\workshop>p4 changes -u samwise #2009/01/01,2010/01/01
Change 7479 on 2009/11/12 by samwise#samwise-silver 'Fix typo in last change. '
Change 7113 on 2009/01/22 by samwise#samwise-silver 'Make VSStoP4 html page a redire'
To reformat this into something resembling a CSV at the CLI you could use the -F flag:
p4 -Ztag -F %user%,%change%,%client%,%desc% changes -u samwise #2009/01/01,2010/01/01
samwise,7479,samwise-silver,Fix typo in last change.
samwise,7113,samwise-silver,Make VSStoP4 html page a redire
I would personally use Python rather than Bash to finish massaging this into the desired form though:
import csv
from datetime import datetime
import sys
from P4 import P4
out = csv.writer(sys.stdout)
out.writerow([
"S.No",
"Perforce Username",
"Change List",
"Submitted Date",
"Workspace",
"Description"
])
my_users = {'samwise'}
with P4().connect() as p4:
for i, change in enumerate(p4.run_changes('#2009/01/01,2010/01/01'), 1):
if change['user'] not in my_users:
# This makes it easy to handle multiple users
continue
out.writerow([
i,
change['user'],
change['change'],
datetime.fromtimestamp(int(change['time'])).date(),
change['client'],
change['desc'].strip()
])
produces:
S.No,Perforce Username,Change List,Submitted Date,Workspace,Description
1,samwise,7479,2009-11-12,samwise-silver,Fix typo in last change.
2,samwise,7113,2009-01-22,samwise-silver,Make VSStoP4 html page a redire

Related

How to read the most recent Excel export into a Pandas dataframe without specifying the file name?

I frequent a real estate website that shows recent transactions, from which I will download data to parse within a Pandas dataframe. Everything about this dataset remains identical every time I download it (regarding the column names, that is).
The name of the Excel output may change, though. For example, if I already have download a few of these in my Downloads folder, the file that's exported may read "Generic_File_(3)" or "Generic_File_(21)" if I already have a few older "Generic_File" exports in that folder from a previous export.
Ideally, I'd like my workflow to look like this: export this Excel file of real estate sales, then run a Python script to read in the most recent export as a Pandas dataframe. The catch is, I don't want to have to go in and change the filename in the script to match the appending number of the Excel export everytime. I want the pd.read_excel method to simply read the "Generic_File" that is appended with the largest number (which will obviously correspond to the most rent export).
I suppose I could always just delete old exports out of my Downloads folder so the newest, freshest export is always named the same ("Generic_File", in this case), but I'm looking for a way to ensure I don't have to do this. Are wildcards the best path forward, or is there some other method to always read in the most recently downloaded Excel file from my Downloads folder?
I would use the OS package and create a method to read to file names in the downloads folder. Parsing string filenames you could then find the file following your specified format with the highest copy number. Something like the following might help you get started.
import os
downloads = os.listdir('C:/Users/[username here]/Downloads/')
is_file = [True if '.' in item else False for item in downloads]
files = [item for keep, item in zip(is_file, downloads) if keep]
** INSERT CODE HERE TO IDENTIFY THE FILE OF INTEREST **
Regex might be the best way to find matches if you have a diverse listing of files in your downloads folder.

Generate list of all Merge Requests merged between two tags with all informations in csv file

I'm trying to do a csv file with all informations about merge requests merged between two tags. I'm trying to get this kind of information for each merge request:
UID; ID; TITLE OF MR; REPOSITORIES; STATUS; MILESTONE; ASSIGNED; CREATION-DATE; MERGED-DATE; LABEL; URL.
For now I have a command that get all merge requests merged between two tags with some informations and put it in csv file:
git log --merges --first-parent master --pretty=format:"%aD;%an;%H;%s;%b" TagA..TagB --shortstat >> MRList.csv
How can I get the other informations? I saw in the git log api only options in my command but I can't find others.
Thank you for your help !
I've written a small Python script to do this. Usage:
git log --pretty="format:%H" <start>..<end> | python collect.py
Script:
#!/usr/bin/env python
import sys
import requests
endpoint = 'https://gitlab.com'
project_id = '4242'
mrs = set()
for line in sys.stdin:
hash = line.rstrip('\n')
r = requests.get(endpoint + '/api/v4/projects/' + project_id + '/repository/commits/' + hash + '/merge_requests')
for mr in r.json():
if mr['id'] in mrs:
continue
mrs.add(mr['id'])
print('!{} {} ({})'.format(mr['iid'], mr['title'], mr['web_url']))
For now, this is not yet between two tags, but you have, with GitLab 13.6 (November 2020):
Export merge requests as a CSV
Many organizations are required to document changes (merge requests) and the data surrounding those transactions such as who authored the MR, who approved it, and when that change was merged into production. Although not an exhaustive list, it highlights the recurring theme of traceability and the need to export this data from GitLab to serve an audit or other regulatory requirement.
Previously, you would need to use GitLab’s merge requests API to compile this data using custom tooling. Now, you can click one button and receive a CSV file that contains the necessary chain of custody information you need.
See Documentation and Issue.

Eyed3 write ID3 YEAR tag to Mp3

Using eyed3 I have no problems setting all other tags than the YEAR tag, also there is no problem reading the .getBestDate() put I can't write the tag.
import eyed3
audiofile = eyed3.load("Example.mp3")
print(audiofile.tag.getBestDate()) # returns the Year
audiofile.initTag()
audiofile.tag.xxxxxxxxx = ("1843") # how to write the Year?
audiofile.tag.save()
I have trawled the manual https://buildmedia.readthedocs.org/media/pdf/eyed3/latest/eyed3.pdf and google but just can't figure it out.
The year tag is simply year :
audiofile.tag.year
But I've never been able to get it to successfully add the year, never see any errors, it's just always blank.

Efficient algorithm, for cleaning large csv files

So I've got a large database, contained inside csv files, there about 1000+ of them with about 24 million rows per csv. And I want to clean it up.
This is a example of data in the csv:
So as you can see there are rows that have the same 'cik' so I want to clean all of them so we get unique 'cik' and we do not have any duplicates.
I've tried to do it with python, but couldn't manage to do it.
Any suggestions would be helpful.
The tsv-uniq tool from eBay's TSV Utilities can do this type of duplicate removal (disclaimer: I'm the author). tsv-uniq is similar to the Unix uniq program, with two advantages: Data does not need to be sorted and individual fields can be used as the key. The following commands would be used to remove duplicates on the cik and cik plus ip fields:
$ # Dedup on cik field (field 5)
$ tsv-uniq -H -f 5 file.tsv > newfile.tsv
$ # Dedup on both cik and ip fields (fields 1 and 5)
$ tsv-uniq -H -f 1,5 file.tsv > newfile.tsv
The -H option preserves the header. The above forms use TAB as the field delimiter. To use comma or another character use the -d|--delimiter option as follows:
$ tsv-uniq -H -d , -f 5 file.csv > newfile.csv
tsv-uniq does not support CSV-escape syntax, but it doesn't look like your dataset needs escapes. If your dataset does use escapes, it can likely be converted to TSV format (without escapes) using the csv2tsv tool in the same package. The tools run on Unix and MacOS, there are prebuilt binaries the Releases page.
This is what I used to filter out all the duplicates with the same 'cik' and 'ip'
import pandas as pd
chunksize = 10 ** 5
for chunk in pd.read_csv('log20170628.csv', chunksize=chunksize):
df = pd.DataFrame(chunk)
df = df.drop_duplicates(subset=["cik", "ip"])
df[['ip','date','cik']].to_csv('cleanedlog20170628.csv', mode='a')
But when running the program I got this warning:
sys:1: DtypeWarning: Columns (14) have mixed types. Specify dtype option on import or set low_memory=False.`
So I am not sure does my code have a bug, or it something to do with the data from the csv.
I opened the csv to check that data its seems alright.
I have cut the number of rows from 24 million to about 5 million that was the goal from the start. But this error is bugging me...

Linux shell script in importing csv data file to DB2

I am new to Linux and would like to seek for your help. The task is to import csv data to DB2. It is in shell script, and on scheduled run. The file has a header that is why I used skipcount 1. Delimiter is comma so since it is the default one, I did not include COLDEL.
Can you help me troubleshoot as to why upon running the script, we got the error below? I am using IMPORT and INSERT_UPDATE because I learned that the LOAD method deletes the whole contents of the table before importing the data from CSV file. The existing data in the table should not be deleted. Records will only be updated if there are changes from the CSV file, otherwise, should create a new record.
I am looking at which METHOD should be used in getting the specific values from the CSV file and currently I am using METHOD P. I am not so sure with regards to numbering inside its parameter, does it signify how many columns are there to be accessed, and should tally with the ones I am importing from the file?
Below is the script snippet:
db2 connect to MYDB user $USERID using $PASSWORD
LOGFILE=/load/log/MYDBLog.txt
if [ -f /load/data/CUST_GRP.csv ]; then
db2 "import from /load/data/CUST_GRP.csv of del skipcount 1 modified by usedefaults METHOD P(1,2,3,4,5)
messages $LOGFILE
insert_update into myuser.CUST(NUM_ID,NUM_GRP,NUM_TEAM,NUM_LG,NUM_STATUS)";
else echo "/load/data/CUST_GRP.csv file not found." >> $LOGFILE;
fi
if [ -f /load/data/CUST_GRP.csv ]; then
db2 "import from /load/data/CUST_GRP.csv of del skipcount 1 modified by dateformat=\"YYYY-MM-DD\" timeformat=\"HH:MM:SS\" usedefaults METHOD P(1,2,3,4,5,6,7)
messages $LOGFILE
insert_update into myuser.MY_CUST(NUM_CUST,DTE_START,TME_START,NUM_CUST_CLSFCN,DTE_END,TME_END,NUM_CUST_TYPE)";
else echo "/load/data/CUST_GRP.csv file not found." >> $LOGFILE;
fi
The error I am encountering is this:
SQL0104N An unexpected token "modified" was found following "<identifier>".
Expected tokens may include: "INSERT". SQLSTATE=42601
Thank you!
You can’t place clauses in the arbitrary order in the IMPORT command.
Place the skipcount 1 clause before messages.
LOAD command can either INSERT new portion of data or REPLACE the table contents emptying it at the beginning.

Resources