How to put multiple documents into Berkeley-DB XML container? - berkeley-db-xml

I have a directory with a bunch of XML documents and want to put all of them into a container.
In other words, I need to do something like this:
dbxml> putDocument tests/*.xml
I have written a GUI program to do that but the host server does not have X-windows installed, so must be in command line.

I do a similar thing when reloading certain XML docs into my current application DB. It helps if all of the files sharing a common naming convention. In python you would could use the following script to add doc001.xml to doc009.xml:
from bsddb3.db import *
from dbxml import *
#Load source files 001 - 009
sourceDir = 'C:/directory-containing-xml-docs'
fileRange = range(1,10)
for x in fileRange:
mycontainer = mymgr.openContainer("myDB.dbxml")
xmlucontext = mymgr.createUpdateContext()
xmlinput = mymgr.createLocalFileInputStream(sourceDir + "doc00" + str(x) + ".xml")
mycontainer.putDocument("doc00" + str(x) + ".xml", xmlinput, xmlucontext)
print 'Added: ' + str(x)
del mycontainer
print '1 - 9 Added'
Hope that helps

You could have a shell script write the list of XML files to another file and then call dbxml_load_container with the -f option.

Ended up using a script that lists files and puts everything into the DB.

Related

Python Glob - Get Full Filenames, but no directory-only names

This code works, but it's returning directory names and filenames. I haven't found a parameter that tells it to return only files or only directories.
Can glob.glob do this, or do I have to call os.something to test if I have a directory or file. In my case, my files all end with .csv, but I would like to know for more general knowledge as well.
In the loop, I'm reading each file, so currently bombing when it tries to open a directory name as a filename.
files = sorted(glob.glob(input_watch_directory + "/**", recursive=True))
for loop_full_filename in files:
print(loop_full_filename)
Results:
c:\Demo\WatchDir\
c:\Demo\WatchDir\2202
c:\Demo\WatchDir\2202\07
c:\Demo\WatchDir\2202\07\01
c:\Demo\WatchDir\2202\07\01\polygonData_2022_07_01__15_51.csv
c:\Demo\WatchDir\2202\07\01\polygonData_2022_07_01__15_52.csv
c:\Demo\WatchDir\2202\07\01\polygonData_2022_07_01__15_53.csv
c:\Demo\WatchDir\2202\07\01\polygonData_2022_07_01__15_54.csv
c:\Demo\WatchDir\2202\07\01\polygonData_2022_07_01__15_55.csv
c:\Demo\WatchDir\2202\07\05
c:\Demo\WatchDir\2202\07\05\polygonData_2022_07_05__12_00.csv
c:\Demo\WatchDir\2202\07\05\polygonData_2022_07_05__12_01.csv
Results needed:
c:\Demo\WatchDir\2202\07\01\polygonData_2022_07_01__15_51.csv
c:\Demo\WatchDir\2202\07\01\polygonData_2022_07_01__15_52.csv
c:\Demo\WatchDir\2202\07\01\polygonData_2022_07_01__15_53.csv
c:\Demo\WatchDir\2202\07\01\polygonData_2022_07_01__15_54.csv
c:\Demo\WatchDir\2202\07\01\polygonData_2022_07_01__15_55.csv
c:\Demo\WatchDir\2202\07\05\polygonData_2022_07_05__12_00.csv
c:\Demo\WatchDir\2202\07\05\polygonData_2022_07_05__12_01.csv
For this specific program, I can just check if the file name contains.csv, but I would like to know in general for future reference.
Line:
files = sorted(glob.glob(input_watch_directory + "/**", recursive=True))
replace with the line:
files = sorted(glob.glob(input_watch_directory + "/**/*.*", recursive=True))

retrieving files from ftp to specific os directory using python 3

I'm trying to automate a daily ftp transfer using a Python3 script. I'm having a small issue though with writing the files were i want them to be. This is what I'm doing:
import time, os
from ftplib import FTP
from datetime import datetime
today=time.strftime('%d%m%y')
dirName='mydir'+today
if not os.path.exists(dirName):
os.mkdir(dirName)
print("Directory " , dirName , " Created ")
else:
print("Directory " , dirName , " already exists")
os.chdir(dirName)
start = datetime.now()
ftp = FTP('ftp')
ftp.login('user','pass')
ftpdir='localdir'+today
ftp.cwd(ftpdir)
# Get All Files
files = ftp.nlst()
# Print out the files
for file in files:
print("Downloading..." + file)
ftp.retrbinary("RETR " + file, open(dirName + file, 'wb').write)
ftp.close()
what I get with this code is that all the downloaded ftp files stay in the folder level above "today" while their filename start with the "today" str.
Can someone give a hand here please.
Thanks in advance
You have to separate the path components. For platform independent solution, use os.path.join:
import os
dirName = os.path.join('mydir', today)
Solved the issue with a bar:
# Print out the files
for file in files:
print("Downloading..." + file)
ftp.retrbinary("RETR " + file, open(dirName + '\\' + file, 'wb').write)
ftp.close()

files are saved repeatedly with single name, no looping, no ranging

My codes runs well, but have one flaw. They are not saving accordingly. For example, Let's say I caught 3 jpeg files, when I ran the codes, it saves 3 times on slot 1, 3 times on slot 2, and 3 times on slot 3. So I ended up with 3 same files.
I think there is something wrong with my looping logic?
If I changed for n in range(len(soup_imgs)): to for n in range(len(src)):, the operation saves infinitely of the last jpeg files.
soup_imgs = soup.find(name='div', attrs={'class':'t_msgfont'}).find_all('img', alt="", src=re.compile(".jpg"))
for i in soup_imgs:
src = i['src']
print(src)
dirPath = "C:\\__SPublication__\\"
img_folder = dirPath + '/' + soup_title + '/'
if (os.path.exists(img_folder)):
pass
else:
os.mkdir(img_folder)
for n in range(len(src)):
n += 1
img_name = dirPath + '/' + soup_title + '/' + str({}).format(n) + '.jpg'
img_files = open(img_name, 'wb')
img_files.write(requests.get(src).content)
print("Outputs:" + img_name)
I am amateur in coding, just started not long ago as a hobby of mine. Please give me some guidance, chiefs.
Try this when you are writing your image files:
from os import path
for i, img in enumerate(soup_imgs):
src = img['src']
img_name = path.join(dirPath, soup_title, "{}.jpg".format(i))
with open(img_name, 'wb') as f:
f.write(requests.get(src).content)
print("Outputs:{}".format(img_name))
You need to loop over all image sources, rather than using the last src value from a previous for block.
I've also added a safer method for joining directory and file paths that should be OS independent. Finally, when opening a file, always use the with open() as f: construct - this way Python will automatically close the filehandle for you.

not allowing me to copy file do to files permission

I have made a program that changes the background picture of my desktop every user defined seconds and there is no problems with that part. However I do have a problem with changing the picture of the log in screen I have set up my computer correctly (a windows 7 computer) to change the background picture (including editing the registry to be able to change the picture more then once).
I currently changing my login picture manually by moving the picture to a folder I created C:\windows\system32\oobe\info\backgrounds. I've chosen to make the task automated using python. I will delete the existing image, copy the new image to the folder and then rename the image to backgroundDefault.jpg and repeat every user defined seconds.
To copy, rename, and delete these files using the os module I've tested these steps in my cmd and it works.
Now what seems to be the problem?
Well I'm able to find the folder using os.path.exists however I'm unable to copy, delete or rename anything because the program doesn't have permission.
It is also worth noting that I have already tried to give my user permission to write in the folder and I already tried to give the python programme administrator access not only the python program but Py.exe and th pyw.exe that sit in the wondows folder
Is there a way to give the program permission or is there another way of changing the folder that I need to move the files to? Or even is there a different snipit of script I could use to achieve a login background change?
def path_writer_bg():
folders_path_bg = input("Please type the folders path of the login background here, then press enter"
"\n>")
if os.path.exists(folders_path_bg):
open("your_path_bg.txt", "w").write(folders_path_bg)
read_folder_path_bg = open("your_path_bg.txt", "r").read()
if os.path.exists("task_bg.txt"):
open("task_bg.txt", "w").write("dir " + read_folder_path_bg + " /s /b >listed_bg.txt")
file_read_task_bg = open("task_bg.txt", "r").readline()
else:
open("task_bg.txt", "w").write("dir " + read_folder_path_bg + " /s /b >listed_bg.txt")
file_read_task_bg = open("task_bg.txt", "r").readline()
os.popen(file_read_task_bg)
else:
input("invalid input. press enter to retry \n")
path_writer_bg()
if os.path.exists("your_path_bg.txt"):
read_folder_path_e_bg = open("your_path_bg.txt", "r").read()
open("task_bg.txt", "w").write("dir " + read_folder_path_e_bg + " /s /b >listed_bg.txt")
file_read_task_e_bg = open("task_bg.txt", "r").readline()
os.popen(file_read_task_e_bg)
else:
path_writer_bg()
def everything_bg():
with open("listed_bg.txt") as file_bg:
num_lines_bg = sum(1 for line_bg in open("listed_bg.txt"))
for num_bg, line_bg in enumerate(file_bg, 1):
rand_line_bg = random.randrange(num_lines_bg - 1)
lines_bg = open("listed_bg.txt", "r").readlines()
open('temp_bg.txt', 'w').writelines(lines_bg[rand_line_bg])
wallpaper_bg()
def wallpaper_bg():
path_bg = open("temp_bg.txt", "r").readline()
if os.path.exists("C:\\Windows\\System32\\oobe\\info\\backgrounds\\backgroundDefault.jpg"):
os.popen("del C:\\Windows\\System32\\oobe\\info\\backgrounds\\backgroundDefault.jpg")
else:
pass
if os.popen("copy /y" + path_bg + " C:\\Windows\\System32\\oobe\\info\\backgrounds"):
os.popen("dir C:\\Windows\\System32\\oobe\\info\\backgrounds /s /b >renamer.txt")
else:
pass
if os.path.exists("renamer.txt"):
rename = open("renamer.txt", "r").readline()
else:
pass
os.popen("rename " + rename + "backgroundDefault.jpg")
wallpaper_bg()
exit()
time.sleep(10)
everything()
Maybe it is because that specific file can't be changed with python due to Microsoft not allowing software to change any file in system32.

How can I extract static libs containing repeated object files?

I'm trying to build a big static library merging two static libraries. In moment I'm using the 'ar' command, extracting objects, for example, from 'a.a' and 'b.a' and then reassembling these objects using 'ar' again:
$ ar x a.a
$ ar x b.a
$ ar r merged.a *.o
Unfortunately it isn't working for my purpose, since a.a has inside different objects with the SAME NAME. The 'ar' command is extracting the repeated objects and replacing the already extracted ones with the same name. Even with the same name, these objects have different symbols, so I get undefined references since some symbols are being missed together with the replaced files.
I have no access to the original objects and already tried 'ar xP' and 'ar xv' and lots of 'ar stuff'. Does anyone can help me showing how to merge these libs?
Thanks in advance.
I tried 'ar p', but talking to a friend it was decided the following python solution could be better. Now it's possible to extract the repeated object files.
def extract_archive(pathtoarchive, destfolder) :
archive = open(pathtoarchive, 'rb')
global_header = archive.read(8)
if global_header != '!<arch>\n' :
print "Oops!, " + pathtoarchive + " seems not to be an archive file!"
exit()
if destfolder[-1] != '/' :
destfolder = destfolder + '/'
print 'Trying to extract object files from ' + pathtoarchive
# We don't need the first and second chunk
# they're just symbol and name tables
content_descriptor = archive.readline()
chunk_size = int(content_descriptor[48:57])
archive.read(chunk_size)
content_descriptor = archive.readline()
chunk_size = int(content_descriptor[48:57])
archive.read(chunk_size)
unique_key = 0;
while True :
content_descriptor = archive.readline()
if len(content_descriptor) < 60 :
break
chunk_size = int(content_descriptor[48:57])
output_obj = open(destfolder + pathtoarchive.split('/')[-1] + '.' + str(unique_key) + '.o', 'wb')
output_obj.write(archive.read(chunk_size))
if chunk_size%2 == 1 :
archive.read(1)
output_obj.close()
unique_key = unique_key + 1
archive.close()
print 'Object files extracted to ' + destfolder + '.'
You have to extract the objects from the static library (the library which contains the duplicated objects)
Then you have to build a new library from the extracted objects.
The new library will contain ONLY ONE instance of the duplicated objects.
You have to use the ar t command to produce the lists of the objects from the two libraries (the original-with duplicates one and the new one - without duplicates).
Then use e.g vimdiff to check the differences between the two list.
Write down all the differences.
Then extract only those objects (step 6) objects from the original library, using the command ar x my_original_lib.a object.o
Then rename the produced extracted object to any name you like
Then use the command ar m my_original_lib.a object.o to rearrange the object.o
Then use the same command, as step 7, and you will extract the second object.o
Give a different name to the newly extracted object
Use both of them to build the new library.
The method holds for ANY number of duplicated objects in the static library. Just use the step 9 and 7 repeatedly to extract all the dublicates
You can rename objects -- their name does not mean anything during linking. This should work:
mkdir merge-objs &&
cd merge-objs &&
ar x ../a.a &&
for j in *.o; do mv $j a-$j; done &&
ar x ../b.a &&
ar r ../merged.a *.o &&
cd .. && rm -rf merge-objs
A C++ code which merges many libraries into a single new one, without overwriting possible duplicated objects is here: http://bazaar.launchpad.net/~paulsepolia/+junk/arbet/files/head:/0025_arbet_FINAL/

Resources