Skip to content
This repository has been archived by the owner on Sep 2, 2021. It is now read-only.

fluentpython/isis2json

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code
This branch is 7 commits ahead, 1 commit behind bireme:master.

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
lib
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

isis2json: CDS/ISIS to JSON database converter

NOTE: This is Python 2.7 code and this repository is archived.

The isis2json.py is a Python/Jython script to export ISIS (MST+XRF) or ISO-2709 databases to JSON files, optionally compatible with CouchDB and MongoDB.

Running under Jython, both MST+XRF and ISO-2709 files can be read, thanks to the Bruma Java library from BIREME, bundled in the lib/ directory. Running under Python, only ISO-2709 files can be read.

A full description of how this script is used can be found in the paper From ISIS to CouchDB: Databases and Data Models for Bibliographic Records.

Usage

$ ./isis2json.py -h
usage: isis2json.py [-h] [-o OUTPUT.json] [-c] [-m] [-t ISIS_JSON_TYPE]
                    [-q QTY] [-s SKIP] [-i TAG_NUMBER] [-u] [-p PREFIX]
                    [-n] [-k TAG:VALUE]
                    INPUT.(mst|iso)

Convert an ISIS .mst or .iso file to a JSON array

positional arguments:
  INPUT.(mst|iso)     .mst or .iso file to read

optional arguments:
  -h, --help          show this help message and exit
  -o OUTPUT.json, --out OUTPUT.json
                      the file where the JSON output should be written
                        (default: write to stdout)
  -c, --couch         output array within a "docs" item in a JSON document
                        for bulk insert to CouchDB via POST to db/_bulk_docs
  -m, --mongo         output individual records as separate JSON objects,
                        one per line for bulk insert to MongoDB via
                        mongoimport utility
  -t ISIS_JSON_TYPE, --type ISIS_JSON_TYPE
                      ISIS-JSON type, sets field structure:
                        1=string, 2=alist, 3=dict
  -q QTY, --qty QTY   maximum quantity of records to read (default=ALL)
  -s SKIP, --skip SKIP  records to skip from start of .mst (default=0)
  -i TAG_NUMBER, --id TAG_NUMBER
                      generate an "_id" from the given unique TAG field
                        number for each record
  -u, --uuid          generate an "_id" with a random UUID for each record
  -p PREFIX, --prefix PREFIX
                      concatenate prefix to every numeric field tag
                        (ex. 99 becomes "v99")
  -n, --mfn           generate an "_id" from the MFN of each record
                        (available only for .mst input)
  -k TAG:VALUE, --constant TAG:VALUE
                      Include a constant tag:value in every record
                        (ex. -k type:AS)

ISIS-JSON Record Types

There are many ways to represent CDS/ISIS records in JSON [1]. This utility currently exports ISIS-JSON types 1, 2 and 3.

Given an ISIS record with this strcuture:

 2 «538886»
10 «Kanda, Paulo Afonso^1USP^2FMUSP^3CRDC^pBrasil^cSão Paulo^rorg»
10 «Smidth, Magali Taino^1USP^2FMUSP^3CRDC^pBrasil^cSão Paulo^rorg»

Below are the three supported representations of that record in JSON:

ISIS-JSON type 1

{"10":
    ["Kanda, Paulo Afonso^1USP^2FMUSP^3CRDC^pBrasil^cSão Paulo^rorg",
     "Smidth, Magali Taino^1USP^2FMUSP^3CRDC^pBrasil^cSão Paulo^rorg"],
 "2":
    ["538886"]
}

ISIS-JSON type 2

{"10":
    [
        [
            ("_", "Kanda, Paulo Afonso"),
            ("1", "USP"),
            ("2", "FMUSP"),
            ("3", "CRDC"),
            ("p", "Brasil"),
            ("c", "São Paulo"),
            ("r", "org")
        ],
        [
            ("_", "Smidth, Magali Taino"),
            ("1", "USP"),
            ("2", "FMUSP"),
            ("3", "CRDC"),
            ("p", "Brasil"),
            ("c", "São Paulo"),
            ("r", "org")
        ]
    ],
 "2":
    [
        [
            ("_", "538886")
        ]
    ]
}

ISIS-JSON type 3

{"10":
    [
        {
            "_": "Kanda, Paulo Afonso",
            "1": "USP",
            "2": "FMUSP",
            "3": "CRDC",
            "c": "São Paulo",
            "p": "Brasil",
            "r": "org"
        },
        {
            "_": "Smidth, Magali Taino",
            "1": "USP",
            "2": "FMUSP",
            "3": "CRDC",
            "c": "São Paulo",
            "p": "Brasil",
            "r": "org"
        }
    ],
 "2":
    [
        {
            "_": "538886"
        }
    ]
}
[1]See section 4.1 of http://journal.code4lib.org/articles/4893

Dependencies

Under Python, isis2json.py depends on:

  • Python2.6 or 2.7
  • argparse.py (bundled; also part of the CPython 2.7 distribution)

Under Jython, isis2json.py depends on:

  • Jython 2.5;
  • argparse.py (bundled)
  • Bruma.jar on the CLASSPATH (bundled);
  • jyson-1.0.1.jar on the CLASSPATH (bundled);

Example CLASSPATH:

export CLASSPATH=/home/luciano/lib/Bruma.jar:/home/luciano/lib/jyson-1.0.1.jar

Troubleshooting

SyntaxError on yield fields running isis2json.py under Jython

If you see this:

Traceback (innermost last):
  (no code object) at line 0
  File "./isis2json.py", line 84
        yield fields
            ^
SyntaxError: invalid syntax

You are probably running Jython 2.2, an old version that is packaged with several Linux distributions such as Debian and Ubuntu. To verify, type:

$ jython --version
Jython 2.2.1 on java1.6.0_20

To fix, download and install Jython 2.5 or later from Jython.org.

IMPORT ERROR: Jython 2.5 and Bruma.jar are required to read .mst files

Check if Jython 2.5 or later is installed:

$ jython --version
Jython 2.5.2

If it is not, se issue above. If it is, add the path to Bruma.jar to the CLASSPATH environment variable, or pass it via the jython -J-cp command line option when running isis2json.py, like this:

$ jython -J-cp lib/jyson-1.0.1.jar:lib/Bruma.jar isis2json.py fixtures/LILACS1.mst

About

CDS/ISIS to JSON database converter (compatible with CouchDB and MongoDB)

Resources

License

LGPL-2.1, GPL-2.0 licenses found

Licenses found

LGPL-2.1
COPYING.LESSER
GPL-2.0
COPYING

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.7%
  • Shell 0.3%