OpenClinica User Manual/SAS

Using CDISC ODM 1.3 with OpenClinica Extensions to export to SAS
SAS can import data in a number of formats. For OpenClinica version 3.x the obvious choice, due to its portability and SAS's import facilities, is CDISC ODM 1.3 XML.

The CDISC ODM 1.3 XML with OpenClinica extensions format includes two types of data: Most of the useful data is CDISC ODM 1.3, but some useful data is 'hidden' in OpenClinica extensions sections (e.g. DateOfBirth, Sex and Event Start Dates).
 * 1) CDISC ODM 1.3 specification data (which SAS's libname knows how to import)
 * 2) OpenClinica extensions data (which SAS can be told how to pick up with a mapping file)

SAS doesn't import OpenClinica Extensions by default
OpenClinica extensions aren't part of the CDISC ODM 1.3 specification, and need to be imported separately. The code below uses a mapping file (extensions.xml) to import some, but not all, of the extensions. Other OpenClinica extensions can be picked up from the data file by altering the mapping file appropriately - OpenClinica extensions are tags and attributes in the data file that have the Openclinica prefix, some of which are important. The only place to obtain the Study Subject ID is by using OpenClinica:StudySubjectId - the SubjectKey refers to the internal OID, which doesn't change even if the Study Subject ID does (the OID remains SS_TEST, even if the Study Subject ID is changed to Test 2). The CDISC ODM 1.3 schema generated by OpenClinica is also documented at Openclinica.com/tools/odm-doc.

How to import CDISC ODM 1.3 XML with OpenClinica extensions into SAS
Download the mapping file.

Our init.sas file has pointers to directories containing:
 * 1) The data to be imported
 * 2) The mapping file to import OpenClinica Extension data

We import using the CDISC ODM 1.3 XML using the libname command (proc cdisc is deprecated as of SAS 9.2). The import.sas file first imports the CDISC ODM file, then imports the OpenClinica Extensions.

SAS doesn't like Partial Dates
CDISC ODM 1.3 includes a specification for a data type called partial date, which SAS can't import (it throws an error when you try to read the file). You may choose to not have partial dates in your study and use a text field instead, validated by a regular expression. Alternatively you can modify the XML export before it is imported to change partial dates to text fields (see below).

SAS doesn't import labels
SAS will use a shortened version (truncated to 8 chars) of the ITEM_NAME as the field name, e.g. DMHEIGHT (the SASFieldName attribute of the ItemDef tag in the XML). It will use the full ITEM_NAME itself as the field label (the Name attribute of the ItemDef tag in the XML). In some cases it would be good to pick up the DESCRIPTION_LABEL as the field label (the Comment attribute of the ItemDef tag in the XML). The code below includes a section (commentable) that switched the Name and Comment attributes of ItemDef tags around, so that DESCRIPTION_LABELs are imported as the label.

Field names too long for SAS
SAS doesn't like field names that are longer than 32 characters including the OpenClinica prefix (e.g. I_HTBAS_BLD_1STDOSESTUDYMEDSGIVEN). You may want to use shorter names (e.g. 8 character long) for new studies, but if you have an old study you may need to replace long field names with shortened versions.

OpenClinica (3.0.3 at least) exports badly encoded entities
OpenClinica can export entities that aren't defined in the DTD (see this issuetracker bug - OpenClinica login needed). The script correctly escapes the leading ampersand.

Script to clean up Partial Dates, Labels and Entities
The script below modifies the CDISC ODM XML directly. For CTIMPs (Clinical Trials Involving Medicinal Products) this alters the valuable data you have been collecting, and should be done carefully, with the changes validated (how exactly is up to you).

How to run the code

 * 1) Copy the code below into Notepad (or a similar text editor)
 * 2) Alter the $extractdirectory value to point to your data extracts (paths can be obtained from the Explorer toolbar). This script creates files ending in _new.xml from existing .xml files (and ignores existing files ending in _new.xml).
 * 3) Type cmd into the Windows Start Orb search box (which opens a terminal)
 * 4) Then copy and paste the code into the terminal (right-clicking on the terminal will open context menu from whence you can paste.)

Further information
This OpenClinica forum thread further discussed details of how to import data into SAS.

OpenClinica SPSS to Stata
On a similar note, for those interested in Stata, there's a tool that converts SPSS to Stata here: http://openclinica.mcri.edu.au/DMTools/OpenClinica/SPSStoStata.aspx

Errors when importing into SPSS
When a dataextract is exported in SPSS-format the zip contains 2 files:


 * the data, in tab-delimited format, named .dat
 * an SPSS-syntax file, which can be run to add vaue and variable labels to the data, named .sps

Most of the time the syntax file must be edited, because it assumes that the dat-file is in the default SPSS directory. If this is not the case,

GET DATA /TYPE = TXT/FILE = 'SPSS_DAT2013-04-19-200700043.dat'

must be changed into something like:

GET DATA /TYPE = TXT/FILE = 'c:\temp\SPSS_DAT2013-04-19-200700043.dat'

Also error messages appear referring to dates. The reason why dates are not imported correctly is in the different date formats in the .dat file and the .sps file: The .dat file contains dates in searchable format whereas the .sps file expects American date format. If you change all ADATE to SDATE in the spss syntax file (search&replace) then it should work.

SAS, R and CSV xsl Transformations
Some excellent xsl transform files were made available by Linas Silva on this developers list thread (also available here).

These transformations can be run on full ODM 1.3 xml extracts, with the accompanying powershell scripts. The CSV powershell script is probably the most important as it chops up the output csv into a series of csvs.

Alternatively, the transforms can be run without powershell, instead using the copy of the saxon library that comes with OpenClinica. Example syntax (all on one line):

Each of the xsl files refer to renaming map stylesheet called 'xml_convert_dynamic_lookup' which lists the CRF and Item Group name combinations and what they should be renamed to in the output. If this is not used, the dataset names default to the Item Group OID.

applying the script for SAS
For SAS transfomations, use the script titled powershell_perform_SAS_xsl_transforms.ps1. This is a powershell-script, so you should open a command-prompt as administrator. Change to the directory where you unzipped the script plus the *.xsl-files and start powershell.

Now you have a Powershell-prompt and first you must set the Execution policy:

Running the script is done with one big line, invoking the script with 7 parameters:

The parameters are:
 * "RDP001_ODM_20140311.xml" and this must be the name of your dataextract in ODM1.3-format
 * "xml_convert_sas_map.xsl" is one of the *.xsl-files of the zip, so do not change this one
 * "sasmap_rdp001_20140310.map" is the mapping-file that is created and you can give this any name you want
 * "xml_convert_sas_data.xsl" is one of the *.xsl-files of the zip, so again: do not change this one
 * "data_rdp1_20140311.xml" is the resulting sas-data-file
 * "xml_convert_sas_format.xsl" is one of the *.xsl-files of the zip: no changing this one
 * "formatout_20140310.sas" is a SAS-file with info about radio- and singleselect-options