Required for this procedure:
Notices from Proquest that files are available.
Filezilla.
Proquest FTP login info.
ETD Directory on hard drive with pdf and xml subdirectories.
7-Zip.
Adobe Acrobat Standard. Modify Acrobat settings: When you're in Acrobat, go to edit, then preferences. Click on "Documents" in the left-hand column. In the main part of the pop-up, under PDF/A view mode, use the drop-down to select "never."
Computer configured to open XML files with WordPad (Right click an XML file and select "Open with" and then "Chose Program." Select WordPad, then click "Always use the selected program to open this kind of file.").
Editix XML Editor.
XSL file for reformatting the XML files, ETDConversionForDspace.xsl (attached here).
Microsoft Excel with the Developer tab enabled and macros enabled (Left click on the windows symbol and select "Excel Options." On the popular tab, check "Show developer tab in the Ribbon." Go to the Trust Center Tab. Click "Trust Center Settings." Click "Enable all Macros.").
Excel Template, ETDtempDspace.xlsm, (attached here).
SAF Builder program (downloaded from Github and installed by LITS) and Java JDK, GIT, and Maven. Oracle VM Virtual Box for running it on Linux, and directory that can be accessed both for Linux and windows. Instructions for installation here: STEPS_rev1.docx. Use the command git clone https://github.com/DSpace-Labs/SAFBuilder to install it.
Collection File program (attached here in a zip file–unzip it and put it in your ETD directory) and Python to run it. It can also be run on the staff eLumin desktop, this procedure includes that method..
For converting video files to mp4's: Avidemux.
...
Open the Excel template ETDtempDspacewithMacrosMASTER.xlsm.
Save the file with a new name for the set your working on.
Run the macro "Delete_Everything" by using CTRL-X. This will delete the content of sheet 1 and any existing XML map. If there is not an existing XML map, it will make an error which can be ignore.
Delete the sheet2 that has old content in it. Create a new worksheet and rename it sheet2 if not named that already.
Return to sheet1, cell 1A. Use developer import to import the file you created using Editix.
Press ctr-r to run the reformatter macro.
Go to the sheet2.
Change the header of the author column from dc.creator to dc.contributor.author
Separate the keywords by changing commas in the keyword field to || where appropriate.
Ensure that there are no spaces in file names. If there are, you'll need to change them in the spreadsheet, and also change the actual file name to match it.
Sort by departments column. Check for any department that didn't fill in and any that aren't the correct department names, using the .csv in the Collection File Program to find the definitive versions of the department name to use and correct any that don’t match it. Also checking the dc.relation.ispartof column and correct it. Watch for a dash in
Marine-Estuarine Environmental Sciences |
where it doesn’t belong. Fix with find and replace, both in the department field and dc.relation.ispartof field with departmental collections names. Add these find and replaces to the macro so that they don't have to be done manually each time.
...
For other problems with the files Proquest FTP's to us, ask Michelle to call Proquest technical support at 877-408-5027 or 800-889-3358 (or email at tsupport@proquest.com or
http://support.proquest.com/ ) to find a solution.
Adding Supplements to the metadata in Excel and Moving them to the PDF Directory
...
Find it's line on the spreadsheet (they are in alphabetical order, but if you don't see it, search for both the author and part of the title). If you can't find it on the spreadsheet, move it to the "not in this set" folder.
Check the title and remember the first couple of Words
Open the publication form. If the publication form file doesn't contain a publication form, or is blank, delete it.
Ensure that the publication form has the correct title. Remember if there's an embargo.
Do Save as...
Replace any blank spaces in the file name with an underscore. Then replace everything between the author's name and .pdf with Open. eg.:
...
Search for any blank spaces in license file names and fix them.
Check that all departments are in the collection builder file. Sort and scan.
Check the rights field labels to ensure they are dcterms.accessRights
Check the author field label is dc.contributor.author.
Fix accessRights issues: Delete extraneous access rights column. Look for and fix an extra space after dcterms.accessRights in the column with the standard note.
Check that dates are in the year-mo-da format. After this step is done, do NOT open in Excel but import selecting "delimited" as type and "comma" as the delimiter. When you get to step 3, make sure ALL the columns with dates are set to TEXT.
...