How to find reuse metrics relatively quickly?

I am trying to get counts of conrefs, variables, images and topics included a baseline to present some metrics about the amount of reuse (some factors are ignored). My approach is to parse the fields:

FISHVARINUSE
FISHLINKS
FISHFRAGMENTLINKS
FISHIMAGELINKS

For each baseline, I get the logical id and version of each map or topic. The DocumentObj.RetrieveLanguageMetadata endpoint only allows you to specify 1 version. So, I make multiple calls per baseline for each version included in the baseline. This is very slow since I have to do this for thousands of baselines.

Do you know of a faster way to gather this information?

Parents Reply Children
  • Hi Kendall - I'm interested in your graph db brand and usage :)

    On the API part, don't know if we can do all this over a written correspondence. Let's see how this evolves...

    A baseline contains LogicalIs (typically GUIDs) and Versions. But the metadata you need is on Language-level (typically source language or publication working language). So instead of retrieving one-by-one using LogicalId plus version, can ask for a report on the baseline that gives you back Language Card Ids. 

    1. TD13SP2 - Baseline25.GetReport, one call which offers you a lot of Language Card Ids. The report relies on the saved baseline entries will "ExpandReport" and "CompleteReport" will fill in the gray zones (e.g. somebody adds an image into a topic, but nobody selected a version for the image yet in the baseline)
    2. TD13SP2 - DocumentObj .5 RetrieveMetadataByIshLngRefs allows you to do group retrieval of your requested fields
  • Ok. I misunderstood the meaning of the reportitem elements apparently. I am using Baselin25.GetReport but each object element only has logicalid and version number. But, the first reportitem has the ishlngref, which I took to be a link rather than the object itself. Thanks!

    I am currently using blazegraph to store RDF and then I use SPARQL to query the triples. I've also used openrdf's native storage implementation with their API and another approach when trees made more sense than graphs, was to use XQuery so I stored data in basex.

    Many problems I have had to solve working with DITA CMS systems has been accounting for the fact the CMS (SDL or other) stores XML as text. Inventing my own logic to deal with the text has usually looked like more work than using one of the standards based solution like XML or RDF, so I've dealt with that by transferring data out of the CMS into an XML database or graph database in order to work with it.