Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Required for this procedure:

...

  • Check for anything unexpectedly left over. ETD's with extra files will unzip into folders or into additional zip files. These will require some initial manipulation to prepare them. Take a look at the extra files and handle each case as appropriate as follows:
    Approval sheets--These are forms that the adviser signs approving the thesis or dissertation. These are extraneous extra files that should simply be deleted. Move the pdf and xml files to your usual directories and process as usual.
    Other data that is usually included in the main file and pdf appendices--Combine the extra files with the main file using Adobe Acrobat. Click "Combine" then click Merge Files Into a Single PDF. Click "Add Files" and select the ones you want to combine. Put the files in the order they should be combined. Click "combine files" and overwrite the main PDF with the new one. Move the pdf and xml files to your usual directories and process as usual.
    Non-pdf appendices and other files not meeting above criteria--Convert them to the most appropriate file format given here: http://aok.lib.umbc.edu/scholarworks/NonProprietaryFileFormats.pdf (if not already in one of these formats). If it can't be satisfactorily converted to one of those file formats, leave it in the format that it's in. Note the name(s) of the file(s) to add them to your spreadsheet later, as well as the author and title of the main document so that you can find it, and put it(them) Put these in a supplement folder in your pdf folder.
  • Open each PDF. If the PDF has a publishing rights form, extract it from the document (document menu, "Extract Pages", page number of publishing rights form, (click delete pages, and click extract pages as a separate file) saving it to the ETDPermission folder on the shared drive, using the file name: last name_first name_year_ETDPermission. In the original ETD document, delete everything left in front of the abstract using the Delete Pages command in the Document menu (Find the last page before the abstract and note the page number. "Click Document" then "Delete Page." Input the page range you need to delete then click "ok" and then "yes.") Save the actual ETD file with the same filename over-writing the original file. Put documents which we have the right to publish an "Open" Put document which we don't have rights for in a "Closed" Directly. Sort the metadata to to match those groupings. Some files will have missing thesis or dissertation. Handle these as follows:
    Missing Documents There are two reasons a thesis or dissertation may be missing. The document may be embargoed, or the document may have not been FTP'ed because it's a large file that couldn't be sent via the Proquest administration page, so was sent to Proquest on disk. To determine which case this is, take a look at the metadata and the DISS_submission publishing_option tag. This is usually the first field in the metadata. In that tag, there is a an embargo code set with a numeric embargo code:
    "0" - No embargo
    "1" - 6 month embargo
    "2" - 1 year embargo
    "3" - 2 year embargo
    "4" - Until specified date
    If the code is 0, we should have the file, and can obtain it by downloading from the ContentDM Administrator Resources & Guidelines page at http://www.etdadmin.com/cgi-bin/main/resources?siteId=75. Click on Dissertations & Theses @ University of Maryland, Baltimore County and search for the missing document. When you find it, download it and process as usual.
    If the code is 1-4, the document is embargoed and we won't receive the document until the embargo period has passed.
    At the end of the metadata file there is a DISS_sales_restriction code," and the date in that tag indicates when the embargo will expire and when we should receive that file. Note the file name along with the date the embargo will expire in the embargo list at the end of this procedure so that we can ensure that we receive the file when the time comes. When you process the metadata for embargoed documents in Excel, insert a note into the metadata for the document stating: "At the author's request, this dissertation isn't being made available at this time." The metadata is then uploaded as usual along with the title page. The metadata will be revised to remove this note when we receive the full file.

For other problems with the files Proquest FTP's to us, ask Michelle to call Proquest technical support at 877-408-5027 or 800-889-3358 (or email at tsupport@proquest.com or
http://support.proquest.com/
) to find a solution.

 

Combine the XML files into 1 File

DOS prompt:

  • Click the Windows Start button and type .cmd in the box. Push enter. A box with DOS will open.
  • Change the directory to the where you want the new file to go by entering cd followed by the path for the directory. For example, “CD C:\ETD” changes the directory to the ETD directory.
  • To copy the individual xml metadata files,  use copy path *.xml newfilename. For example, if your xml files are in the ETD\xml\ directory, “copy c:\ETD\xml\*.xml combined.xml”.

Notepad:

  • Open the new file in notepad. Copy <?xml version="1.0" encoding="iso-8859-1"?> from the beginning of the file. Find and replace with nothing by pasting it  <?xml version="1.0" encoding="iso-8859-1"?> . Put the <?xml version="1.0" encoding="iso-8859-1"?> back at the beginning of the file, inserting a line break between it and the remainder of the XML.
  • Add <ETD> after the <?xml version="1.0" encoding="iso-8859-1”?> at the beginning with a line break between it and the remainder of the XML.
  • At the end of the file, add a line break and </ETD> at the end.
  • Save and close the file.

Reformat the XML File using Editix:

  • Open Editix.
  • Open the XSL file ETDConversionForDspace.xsl (go to file, open, then change the file type to XSLT 2.0 document (*.xsl *.xslt)
  • Go to XSLT/Xquery transform a document
  • In XML source find your XML file.
  • In result find the directory you want the new file to go in and type the name with the extension .xml
  • Click ok.

Prepare the metadata in Excel:

  • Open the Excel template ETDtempDspace.xlsm.
  • Run the macro "Delete_Everything" by using CTRL-X. This will delete the content of sheet 1 and any existing XML map. If there is not an existing XML map, it will make an error which can be ignore.
  • Delete  the sheet2 that has old content in it. Create a new worksheet and rename it sheet2 if not named that already.
  • Return to sheet1, cell 1A. Use developer import to import the file you created using Editix.
  • Press ctr-r to run the reformatter macro.
  • Go to the sheet2.
  • Separate the keywords by changing commas in the keyword field to || where appropriate.
  • Ensure that there are no spaces in file names. If there are, you'll need to change them in the spreadsheet, and also change the actual file name to match it.
  • Check for any department that didn't fill in. If there are any, notify Michelle and wait until she tells you what department to use and has programming (the Collection File program may not work).
  • Delete all of the rows where extra data was filled in.
  • In the filename column, enter the names of any extra files to be loaded in the appropriate line. Separate it from the existing file with ||. In the dc.description column for these, add a note indicating that there's a supplement and it's format, eg "Include 1 .jpeg3 supplement"Some files will have missing thesis or dissertation. Handle these as follows:
    Missing Documents There are two reasons a thesis or dissertation may be missing. The document may be embargoed, or the document may have not been FTP'ed because it's a large file that couldn't be sent via the Proquest administration page, so was sent to Proquest on disk. To determine which case this is, take a look at the metadata and the DISS_submission publishing_option tag. This is usually the first field in the metadata. In that tag, there is a an embargo code set with a numeric embargo code:
    "0" - No embargo
    "1" - 6 month embargo
    "2" - 1 year embargo
    "3" - 2 year embargo
    "4" - Until specified date
    If the code is 0, we should have the file, and can obtain it by downloading from the ContentDM Administrator Resources & Guidelines page at http://www.etdadmin.com/cgi-bin/main/resources?siteId=75. Click on Dissertations & Theses @ University of Maryland, Baltimore County and search for the missing document. When you find it, download it and process as usual.
    If the code is 1-4, the document is embargoed and we won't receive the document until the embargo period has passed.
    At the end of the metadata file there is a DISS_sales_restriction code," and the date in that tag indicates when the embargo will expire and when we should receive that file. Note the file name along with the date the embargo will expire in the embargo list at the end of this procedure so that we can ensure that we receive the file when the time comes. When you process the metadata for embargoed documents in Excel, insert a note into the metadata for the document stating: "At the author's request, this dissertation isn't being made available at this time." The metadata is then uploaded as usual along with the title page. The metadata will be revised to remove this note when we receive the full file.

For other problems with the files Proquest FTP's to us, ask Michelle to call Proquest technical support at 877-408-5027 or 800-889-3358 (or email at tsupport@proquest.com or
http://support.proquest.com/
) to find a solution.

 

Combine the XML files into 1 File

DOS prompt:

  • Click the Windows Start button and type .cmd in the box. Push enter. A box with DOS will open.
  • Change the directory to the where you want the new file to go by entering cd followed by the path for the directory. For example, “CD C:\ETD” changes the directory to the ETD directory.
  • To copy the individual xml metadata files,  use copy path *.xml newfilename. For example, if your xml files are in the ETD\xml\ directory, “copy c:\ETD\xml\*.xml combined.xml”.

Notepad:

  • Open the new file in notepad. Copy <?xml version="1.0" encoding="iso-8859-1"?> from the beginning of the file. Find and replace with nothing by pasting it  <?xml version="1.0" encoding="iso-8859-1"?> . Put the <?xml version="1.0" encoding="iso-8859-1"?> back at the beginning of the file, inserting a line break between it and the remainder of the XML.
  • Add <ETD> after the <?xml version="1.0" encoding="iso-8859-1”?> at the beginning with a line break between it and the remainder of the XML.
  • At the end of the file, add a line break and </ETD> at the end.
  • Save and close the file.

Reformat the XML File using Editix:

  • Open Editix.
  • Open the XSL file ETDConversionForDspace.xsl (go to file, open, then change the file type to XSLT 2.0 document (*.xsl *.xslt)
  • Go to XSLT/Xquery transform a document
  • In XML source find your XML file.
  • In result find the directory you want the new file to go in and type the name with the extension .xml
  • Click ok.

Prepare the metadata in Excel:

  • Open the Excel template ETDtempDspace.xlsm.
  • Run the macro "Delete_Everything" by using CTRL-X. This will delete the content of sheet 1 and any existing XML map. If there is not an existing XML map, it will make an error which can be ignore.
  • Delete  the sheet2 that has old content in it. Create a new worksheet and rename it sheet2 if not named that already.
  • Return to sheet1, cell 1A. Use developer import to import the file you created using Editix.
  • Press ctr-r to run the reformatter macro.
  • Go to the sheet2.
  • Separate the keywords by changing commas in the keyword field to || where appropriate.
  • Ensure that there are no spaces in file names. If there are, you'll need to change them in the spreadsheet, and also change the actual file name to match it.
  • Check for any department that didn't fill in. If there are any, notify Michelle and wait until she tells you what department to use and has programming (the Collection File program may not work).

Adding Supplements to the metadata in Excel:

  • In the filename column, enter the names of any extra files to be loaded in the appropriate line. Separate it from the existing file with ||. In the dc.description column for these, add a note indicating that there's a supplement and it's format, eg "Include 1 .jpeg3 supplement". Move the supplement from the supplement folder to the pdf folder after it's added to the metadata.

Adding Licenses to the metadata in Excel:

  • Scan each license. 
  • Use the naming convention author's last name, first initial, OPEN (e,g, JonesrOPEN,pdf), or author's last name first initial LIM (JonesrLIM.pdf) depending on if access is supposed to be open or limited. 
  • Note any licenses which include embargo periods, including the authors last name and initial and the length of the embargo. and save to the pdf folder
  • In the metadata file, move the files from column A to the Open or Limited access column as appropriate.

Saving the Excel file as a .csv file:

  • Delete all of the rows where extra data was filled in.
  • Save your sheet2 (you must be on it) as a .csv file.
  • Be sure to close Excel or the next steps won't work.

Run the SAF builder:

  1. Put the .csv metadata file is in the same directory with all of the file to be loaded.
  2. Go the dos prompt: in the start menu, click "Run," then type cmd in the box, and click ok.
  3.  Change the directory to the safbuilder directory, by typing cd c:\safbuilder
  4. Run the safbuilder by typing" safbuilder.sh -c c:path to metadata file." For example, "safbuilder.sh -c c:\ETD\metadata.csv would run the safbuilder on the metadata.csv file and all of the files in the directory with it.
  5. The program will make a bunch of text appear in the DOS window. If it doesn't, the program didn't run. If it says ERROR in square brackets, the program didn't run. You probably made a typo when you typed the run command in. Try again, and be sure to type it all correctly.
  6. When it's run correctly, in DOS window, the last line should indicate that ETDtempDspace.csv has been used 0 times, and that should be the only line with a "File:" error See below:

    A SimpleArchiveFormat directory should appear in your folder that the files and the csv file are in.
  7. If there is more than the one "File" error, there is something wrong. See below:

    These errors happen when the files in the folder and filenames in the csv file don't match. Determine if there is a problem that needs to be corrected by comparing your .csv file to the contents of your directory. If necessary, make the corrections, then delete your SimpleArchiveFormat directory, and run the safbuilder again. If you can't fix the problems, or don't know what's causing them, ask Michelle for help. If she's not there, you can copy and paste all the errors to Word by pushing the PrtScn and Ctrl keys together to copy your screen to the clipboard, and paste your screen into Word--if there are many errors, scroll through them getting them all pasted into Word.
  8. If other errors occur, it's usually because of a typo in the command/path. Try to run it again.

...