Autodesk Design Automation API extract Text from DWG file - text

I would like to use the Autodesk Design Automation API to extract all Text and Header information from a .dwg file into a json object. Is this possible with the Design Automation API?
Any example would help.

#Kaliph, yes, without a plugin in .NET/C++/Lisp code, it is impossible to extract block attributes by script only. I'd recommend .NET. It would be easier for you to get started with if you are not familiar with C++.
Firstly, I'd suggest you take a look at the training labs of AutoCAD .NET API:
pick the latest version if you installed a latest version of AutoCAD. The main workflow of API is same across different versions, though. you can also pick C++ (ObjectARX) if you like.
In the tutorials above, it demos how to work with block. And the blog below talks about how to get attributes:
I copied here for convenience:
using Autodesk.AutoCAD;
using Autodesk.AutoCAD.Runtime;
using Autodesk.AutoCAD.ApplicationServices;
using Autodesk.AutoCAD.DatabaseServices;
using Autodesk.AutoCAD.EditorInput;
namespace MyApplication
public class DumpAttributes
public void ListAttributes()
Editor ed =
Database db =
Transaction tr =
// Start the transaction
// Build a filter list so that only
// block references are selected
TypedValue[] filList = new TypedValue[1] {
new TypedValue((int)DxfCode.Start, "INSERT")
SelectionFilter filter =
new SelectionFilter(filList);
PromptSelectionOptions opts =
new PromptSelectionOptions();
opts.MessageForAdding = "Select block references: ";
PromptSelectionResult res =
ed.GetSelection(opts, filter);
// Do nothing if selection is unsuccessful
if (res.Status != PromptStatus.OK)
SelectionSet selSet = res.Value;
ObjectId[] idArray = selSet.GetObjectIds();
foreach (ObjectId blkId in idArray)
BlockReference blkRef =
BlockTableRecord btr =
"\nBlock: " + btr.Name
AttributeCollection attCol =
foreach (ObjectId attId in attCol)
AttributeReference attRef =
string str =
("\n Attribute Tag: "
+ attRef.Tag
+ "\n Attribute String: "
+ attRef.TextString
catch (Autodesk.AutoCAD.Runtime.Exception ex)
ed.WriteMessage(("Exception: " + ex.Message));
I have a sample on making signs on a drawing. It covers getting attributes and modifying attributes:
And I also have a sample on getting Table cells of a drawing:
Hope these could help you to make the plugin for your requirements.

What do you mean by "Header" information? Can you give an example?
Finding an extracting all text objects is relatively easy if you are familiar with the AutoCAD .NET API (or C++ or Lisp).
Here's an example that extracts blocks and layer names:


generate handwash and toilets with Revit api

Within Revit, there is the possibility of locating toilets or handwash, using the option within the system and plumbing fixture. My question is is there the possibility of creating them using the Revit api? The only thing I've seen is the creation of some types of systems as shown by the following line of code:
public bool createHotWater(Connector baseConector, ConnectorSet set,
Document doc)
using (var trans = new Transaction(doc, "SystemHotWater"))
PipingSystem piping = doc.Create.
return true;
catch (Exception e)
return false;
I know that the above code only creates a hot water system, but I would like to know if there is an option to create toilets from the Revit api.
Yes, certainly this can be achieved programmatically as well as manually through the user interface. For this very purpose, the Family API for Creating Family Definitions was introduced in Revit 2010.

create custom module for pdf manipulation

I want to create a custom Kofax module. When it comes to the batch processing the scanned documents get converted to PDF files. I want to fetch these PDF files, manipulate them (add a custom footer to the PDF document) and hand them back to Kofax.
So what I know so far:
create Kofax export scripts
add a custom module to Kofax
I have the APIRef.chm (Kofax.Capture.SDK.CustomModule) and the CMSplit as an example project. Unfortunately I struggle getting into it. Are there any resources out there showing step by step how to get into custom module development?
So I know that the IBatch interface represents one selected batch and the IBatchCollection represents the collection of all batches.
I would just like to know how to setup a "Hello World" example and could add my code to it and I think I don't even need a WinForms application because I only need to manipulate the PDF files and that's it...
Since I realized that your question was rather about how to create a custom module in general, allow me to add another answer. Start with a C# Console Application.
Add Required Assemblies
Below assemblies are required by a custom module. All of them reside in the KC's binaries folder (by default C:\Program Files (x86)\Kofax\CaptureSS\ServLib\Bin on a server).
Setup Part
Add a new User Control and Windows Form for setup. This is purely optional - a CM might not even have a setup form, but I'd recommend adding it regardless. The user control is the most important part, here - it will add the menu entry in KC Administration, and initialize the form itself:
public interface ISetupForm
AdminApplication Application { set; }
void ActionEvent(int EventNumber, object Argument, out int Cancel);
public class SetupUserControl : UserControl, ISetupForm
private AdminApplication adminApplication;
public AdminApplication Application
value.AddMenu("Quipu.KC.CM.Setup", "Quipu.KC.CM - Setup", "BatchClass");
adminApplication = value;
public void ActionEvent(int EventNumber, object Argument, out int Cancel)
Cancel = 0;
if ((KfxOcxEvent)EventNumber == KfxOcxEvent.KfxOcxEventMenuClicked && (string)Argument == "Quipu.KC.CM.Setup")
SetupForm form = new SetupForm();
Runtime Part
Since I started with a console application, I could go ahead and put all the logic into Program.cs. Note that is for demo-purposes only, and I would recommend adding specific classes and forms later on. The example below logs into Kofax Capture, grabs the next available batch, and just outputs its name.
class Program
static void Main(string[] args)
AppDomain.CurrentDomain.AssemblyResolve += (sender, eventArgs) => KcAssemblyResolver.Resolve(eventArgs);
static void Run(string[] args)
// start processing here
// todo encapsulate this to a separate class!
// login to KC
var login = new Login();
login.EnableSecurityBoost = true;
login.ApplicationName = "Quipu.KC.CM";
login.Version = "1.0";
login.ValidateUser("Quipu.KC.CM.exe", false, "", "");
var session = login.RuntimeSession;
// todo add timer-based polling here (note: mutex!)
var activeBatch = session.NextBatchGet(login.ProcessID);
Registering, COM-Visibility, and more
Registering a Custom Module is done via RegAsm.exe and ideally with the help of an AEX file. Here's an example - please refer to the documentation for more details and all available settings.
Minimal CM
[Minimal CM]
Description=Minimal Template for a Custom Module in C#
SetupProgram=Minimal CM Setup
[Setup Programs]
Minimal CM Setup
[Minimal CM Setup]
Last but not least, make sure your assemblies are COM-visible:
I put up the entire code on GitHub, feel free to fork it. Hope it helps.
Kofax exposes a batch as an XML, and DBLite is basically a wrapper for said XML. The structure is explained in AcBatch.htm and AcDocs.htm (to be found under the CaptureSV directory). Here's the basic idea (just documents are shown):
A single document has child elements itself such as pages, and multiple properties such as Confidence, FormTypeName, and PDFGenerationFileName. This is what you want. Here's how you would navigate down the document collection, storing the filename in a variable named pdfFileName:
IACDataElement runtime = activeBatch.ExtractRuntimeACDataElement(0);
IACDataElement batch = runtime.FindChildElementByName("Batch");
var documents = batch.FindChildElementByName("Documents").FindChildElementsByName("Document");
for (int i = 0; i < documents.Count; i++)
// 1-based index in kofax
var pdfFileName = documents[i + 1]["PDFGenerationFileName"];
Personally, I don't like this structure, so I created my own wrapper for their wrapper, but that's up to you.
With regard to the custom module itself, the sample shipped is already a decent start. Basically, you would have a basic form that shows up if the user launches the module manually - which is entirely optional if work happens in the back, preferably as Windows Service. I like to start with a console application, adding forms only when needed. Here, I would launch the form as follows, or start the service. Note that I have different branches in case the user wants to install my Custom Module as service:
else if (Environment.UserInteractive)
// run as module
Application.Run(new RuntimeForm(args));
// run as service
ServiceBase.Run(new CustomModuleService());
The runtime for itself just logs you into Kofax Capture, registers event handlers, and processes batch by batch:
// login to KC
cm = new CustomModule();
cm.Login("", "");
// add progress event handlers
cm.BatchOpened += Cm_BatchOpened;
cm.BatchClosed += Cm_BatchClosed;
cm.DocumentOpened += Cm_DocumentOpened;
cm.DocumentClosed += Cm_DocumentClosed;
cm.ErrorOccured += Cm_ErrorOccured;
// process in background thread so that the form does not freeze
worker = new BackgroundWorker();
worker.DoWork += (s, a) => Process();
Then, your CM fetches the next batch. This can either make use of Kofax' Batch Notification Service, or be based on a timer. For the former, just handle the BatchAvailable event of the session object:
session.BatchAvailable += Session_BatchAvailable;
For the latter, define a timer - preferrably with a configurable polling interval:
pollTimer.Interval = pollIntervalSeconds * 1000;
pollTimer.Elapsed += PollTimer_Elapsed;
pollTimer.Enabled = true;
When the timer elapses, you could do the following:
private void PollTimer_Elapsed(object sender, System.Timers.ElapsedEventArgs e)

Retrieve tags from user profile in IBM Connections using Social Business Toolkit does not give tags

Hi we want to add tags to a userprofile programmatically.
We are using the Social Business Toolkit for this purpose.
More specifically we use the ProfileService, first we need to get the current tags and this always gives 0 results.
String userEmail = "";
Map<String, String> params = new HashMap<String, String>();
EntityList<Tag> tags = app.profileService.getTags(userEmail, params);
EntityList<Tag> tags = app.profileService.getTags("427ffbb1-ab50-4e82-97b2-46bf5584e799");
both give no tags (tags.size == 0) when we try to print them
if (tags.size() <= 0) {
System.out.println("No tags to be displayed");
for (Tag tag : tags) {
System.out.println("Tag : " + tag.getTerm());
System.out.println("Tag Frequency: " + tag.getFrequency());
System.out.println("Tag Visibility : "
+ tag.isVisible());
I have tried to test this with Connections Cloud and Greenhouse , but for those platforms I get authorization errors.
I tried this both with a 4.5 and 5.0 Connections environment, both giving the same result.
However when I use the URL
I do get (XML) results.
We are using version 1.1.9.
On both environments, you need to use the key to access a user's tags.
You can get the key from the service document.
Those two environments do not use targetEmail.


I've been working on Text Extractor that works on .docx file using Tika. And it is working file for basic text and text in tables and textboxes, but it fails for images.
How do I get text from Image, tesseract along with tika can be used to get text from an image alone but for that I would need to extract out the image from document. How do I do this?
Kindly help if anybody has worked upon something like this.
This the code that works fine for text, textbox and tables,but not for images:
public class BasicDocumentExtractor {
public static void main(final String[] args) throws IOException,SAXException, TikaException {
//detecting the file type
BodyContentHandler handler = new BodyContentHandler();
Metadata metadata = new Metadata();
FileInputStream inputstream=new FileInputStream(new File("D:\\Nidhi\\sw\\ws\\Hello.docx"));
ParseContext pcontext=new ParseContext();
//OOXml parser
OOXMLParser msofficeparser=new OOXMLParser ();
msofficeparser.parse(inputstream, handler,metadata,pcontext);
System.out.println("Contents of the document:" +handler.toString());
/*System.out.println("Metadata of the document:");
String[] metadataNames = metadata.names();
for(String name : metadataNames){
System.out.println(name + ": " + metadata.get(name));
You need to enable recursion in Tika in order to get the embedded images. The simplest way is normally just to use the RecursiveParserWrapper to do it for you.
If you use it, your code would instead be roughly
BodyContentHandler handler = new BodyContentHandler();
Metadata metadata = new Metadata();
TikaInputStream input = TikaInputStream.get(new File("D:\\Nidhi\\sw\\ws\\Hello.docx"));
Parser wrapped = new AutoDetectParser();
RecursiveParserWrapper wrapper = new RecursiveParserWrapper(wrapped,
new BasicContentHandlerFactory(BasicContentHandlerFactory.HANDLER_TYPE.TEXT, 60));
wrapper.parse(stream, handler, metadata, context);
// Get metadata from children
List<Metadata> list = wrapper.getMetadata();
// Get metadata from main document
System.out.println("Main doc name is " + metadata.get(TikaCoreProperties.TITLE));
System.out.println("Contents of the document:" +handler.String());
As I was trying really hard to do this since las 24hours, I figured out a way, a pretty easy one. Since, Tika is built on the top of POI, using POI this task can be efficiently executed. Also, it is not a direct solution so alomost no tutorials are available for this purpose, I hope nobody else has to face this issue in future. This is the running code that extracts all images from a .docx document:
public static void getImages() throws Exception {
XWPFDocument doc=new XWPFDocument(new FileInputStream("D:\\Nidhi\\CDAC\\Images\\test1.docx"));
List images=doc.getAllPictures();
int i =0;
while (i<images.size()) {
XWPFPictureData pic= (XWPFPictureData) images.get(i);
System.out.println(pic.getFileName() + " "+ pic.getPictureType() +" "+ pic.getData());
FileOutputStream fos=new FileOutputStream("D:\\Nidhi\\CDAC\\Images\\b" + i+".jpg");
Also, if it will work on all MS Office 2007+ files, for .doc and such files use HWPF in the exactly same manner.

Customizing upload file functionality in SharePoint picture library

Can anyone help me ,I want to customize upload functionality in which i want to validate the uploaded image type to the picture library
where can i set my script ?? Any one can advise ???
You might be Use ItemAdding. In ItemAdding Event Method just check extension of the Document before successfully uploaded to the Library.if unvalid document than through Error message
your code something like this :
protected string[] ValidExtensions = new string[] { "png", "jpeg", "gif"};
public override void ItemAdding(SPItemEventProperties properties)
string strFileExtension = Path.GetExtension(properties.AfterUrl);
bool isValidExtension = false;
string strValidFileTypes = string.Empty;
using (SPWeb web = properties.OpenWeb())
foreach (string strValidExt in ValidExtensions)
if (strFileExtension.ToLower().EndsWith(strValidExt.ToLower()))
isValidExtension = true;
strValidFileTypes += (string.IsNullOrEmpty(strValidFileTypes) ? "" : ", ") + strValidExt;
// Here i am going to check is this validate or not if not than redirect to the
//Error Message Page.
if (!isValidExtension)
properties.Status = SPEventReceiverStatus.CancelWithRedirectUrl;
properties.RedirectUrl = properties.WebUrl + "/_layouts/error.aspx?ErrorText=" + "Only " + strValidFileTypes + " extenstions are allowed";
You could use SPItemEventReceiver for your library and add your logic into ItemUpdating() and ItemAdding() methods.
You can try creating a custom list template and replace the default NewForm.aspx and EditForm.aspx pages there. These custom form templates need not contain the same user controls and buttons as in the default picture library template. You could create a Silverlight web part with rich UI to upload images, e.g. The more you want to differ the more code you'll have to write...
An OOTB solution I can think of would be a workflow that you would force every new picture to run through but it would be quite an overkill for the end-user...
Of course, if you're able to validate by using just the meta-data in ItemAdding as the others suggest, it'd be a huge time-saver.
--- Ferda
