Albin O. Kuhn Library & Gallery - Staff Wiki
ScholarWorks ETD Load Procedures
Files for Combining ETD and Zotero Loads
Required for this procedure:
Notices from Proquest that files are available.
Filezilla.
Proquest FTP login info.
ETD Directory on hard drive with pdf and xml subdirectories.
7-Zip.
Adobe Acrobat Standard. Modify Acrobat settings: When you're in Acrobat, go to edit, then preferences. Click on "Documents" in the left-hand column. In the main part of the pop-up, under PDF/A view mode, use the drop-down to select "never."
Computer configured to open XML files with WordPad (Right click an XML file and select "Open with" and then "Chose Program." Select WordPad, then click "Always use the selected program to open this kind of file.").
Editix XML Editor.
XSL file for reformatting the XML files, ETDConversionForDspace.xsl (attached here).
Microsoft Excel with the Developer tab enabled and macros enabled (Left click on the windows symbol and select "Excel Options." On the popular tab, check "Show developer tab in the Ribbon." Go to the Trust Center Tab. Click "Trust Center Settings." Click "Enable all Macros.").
Excel Template, ETDtempDspace.xlsm, (attached here).
SAF Builder program (downloaded from Github and installed by LITS) and Java JDK, GIT, and Maven. Oracle VM Virtual Box for running it on Linux, and directory that can be accessed both for Linux and windows. Instructions for installation here: STEPS_rev1.docx. Use the command git clone https://github.com/DSpace-Labs/SAFBuilder to install it.
Collection File program (attached here in a zip file–unzip it and put it in your ETD directory) and Python to run it. It can also be run on the staff eLumin desktop, this procedure includes that method..
For converting video files to mp4's: Avidemux.
When notified that files have been ftp'ed:
When you receive notices from Proquest, first insure that they were able to successfully transfer all of the files. If some transfers fail, they'll likely try again the next day. Wait until you likely have all the files for a semester before beginning work on a set. Be sure to keep track of what you have already loaded into ContentDM and what you haven't loaded.
FTP and Unzip the files (about 100 at a time): Downloaded files through
After FTPing the files, be sure to delete them from the FTP server.
Use Filezilla to FTP the new thesis and dissertations from Proquest. Open Filezilla. Enter the Proquest FTP IP, the username, and port. Push Enter. The last line of the top box on the screen should say "Directory Listing Successful" and the lower left-hand portion of the screen should be populated with files on the Proquest server. The left side of the screen shows your computer--find the ETD folder on your hard drive. Use the date to identify the new files that we need to obtain. Highlight all of the files we need by holding down the shift key while clicking the first and last files you want highlighted. Drag them to your ETD folder. The progress of file transfer will show on the bottom of the screen. Wait while all files transfer (you can minimize and do something else).
Verify that you have all of the files that have been sent by checking the number of files against the number of files the e-mail notices said were successfully downloaded. Add the number of successful downloads. Highlight all the ETD files, right click, and select properties. The number of files stated in the notices should match the total here.
Use 7-Zip to extract the zip files. Open 7-Zip. The folder that your files are in should be selected in the bar across the top of the window. If not, use the drop-down arrow to find it. Once you are on the correct folder, all of your zip files should display in the window. Highlight all of the zip files by holding the the shift key by clicking the first and last files you want highlighted. Click extract. The destination for the extract opens to C:\ETD\ZIP*\. Delete the *\ so that the files all go into the ZIP folder. Click ok.
Use Windows Explorer to sort the files by going to the "View" menu and selecting "arrange by file type.". Select all of the files of a given type, and move the to the appropriate sub-folder: Highlight all of the PDF files by holding down the shift key while clicking the first and last files you want highlighted. Drag them to the PDF subfolder and drop them there (or alternately, copy and paste them). Highlight all of the XML files by holding down the shift key while clicking the first and last files you want highlighted. Drag them to the XML subfolder and drop them there or alternately, copy and paste them).
Transform the Metadata
Combine the XML files into 1 File
DOS prompt:
Click the Windows Start button and type .cmd in the box. Push enter. A box with DOS will open.
Change the directory to the where you want the new file to go by entering cd followed by the path for the directory. For example, “CD C:\ETD” changes the directory to the ETD directory. To go up one level, "CD .." To go to the root directory, "cd /"
To copy the individual xml metadata files, use copy path *.xml newfilename. For example, if your xml files are in the ETD\xml\ directory, “copy c:\ETD\xml\*.xml combined.xml”.
Notepad:
Open the new file in notepad. Copy <?xml version="1.0" encoding="iso-8859-1"?> from the beginning of the file. Find and replace with nothing by pasting it <?xml version="1.0" encoding="iso-8859-1"?> . Put the <?xml version="1.0" encoding="iso-8859-1"?> back at the beginning of the file, inserting a line break between it and the remainder of the XML.
Add <ETD> after the <?xml version="1.0" encoding="iso-8859-1”?> at the beginning with a line break between it and the remainder of the XML.
At the end of the file, add a line break and </ETD> at the end.
TEST: finding all paragraph marks and replacing them with nothing.
Save and close the file.
Reformat the XML File using Editix:
Open Editix.
Open the XSL file ETDConversionForDspace.xsl (go to file, open, then change the file type to XSLT 2.0 document (*.xsl *.xslt)
Go to XSLT/Xquery transform a document
In XML source find your XML file.
In result find the directory you want the new file to go in and type the name with the extension .xml
Click ok.
Prepare the metadata in Excel:
Open the Excel template ETDtempDspacewithMacrosMASTER.xlsm.
Save the file with a new name for the set your working on.
Run the macro "Delete_Everything" by using CTRL-X. This will delete the content of sheet 1 and any existing XML map. If there is not an existing XML map, it will make an error which can be ignore.
Delete the sheet2 that has old content in it. Create a new worksheet and rename it sheet2 if not named that already.
Return to sheet1, cell 1A. Use developer import to import the file you created using Editix.
Press ctr-r to run the reformatter macro.
Go to the sheet2.
Change the header of the author column from dc.creator to dc.contributor.author
Separate the keywords by changing commas in the keyword field to || where appropriate.
Ensure that there are no spaces in file names. If there are, you'll need to change them in the spreadsheet, and also change the actual file name to match it.
Sort by departments column. Check for any department that didn't fill in and any that aren't the correct department names, using the .csv in the Collection File Program to find the definitive versions of the department name to use and correct any that don’t match it. Also checking the dc.relation.ispartof column and correct it. Watch for a dash in
Marine-Estuarine Environmental Sciences |