Boulder OneStop presentation
-
Upload
thomas-jaensch -
Category
Documents
-
view
25 -
download
0
Transcript of Boulder OneStop presentation
netCDF to ISO XML workflowThomas Jaensch, Silver Spring
Steps from netCDF to ISO XML1.ncdump -x to extract metadata from netCDF and output XML (NcML)
instead of CDL
2.Append additional data not included in NcML to the .ncml data to be included in ISO metadata later on, like file name, file size, path to data files on the network/WAF, browse graphic info, etc.
3.XSLT transform to write the data extracted from the .ncml file to an ISO XML file
4.Additional XSLT transform(s) to add additional information, like e.g. collection level keywords to granule metadata
Bash scripts to create ISO XML from netCDFMain nc2iso script that performs the steps described in previous slide
Bash scripts to create ISO XML from netCDFHelper script that runs main nc2iso script
Not horrible but...- Sequential code with a tendency to become cryptic and messy in no
time
- Not easy to understand someone else’s code
- Not easy to scale and extend
- Hard to test
Hello Go!
Why Go?- It’s fast
- Scales well
- Automated code formatting
- Compiled binaries run anywhere
- Great standard library
- Testing framework baked into the language
- Concurrency baked into the language
- Well documented
- Go playground to test stuff in the browser while developing
- Makes it easy to write well documented, easy-to-reason-about code
Code organization
- Code organized by loosely coupled functions that can be placed anywhere it makes sense since Go is a compiled language and the compiler doesn’t care about order (for the most part)
- Easy to extend functionality by adding/removing functions that do specific things without breaking the whole thing
Concurrency- Run multiple processes at
the same time and cut down on overall runtime
- Scales (theoretically) indefinitely if there weren’t hardware limitations
TestingPackage testing provides support for automated testing of Go packages.
OneStop Datasets I’m working on in Silver SpringGranule metadata processes in place for the following datasets I’m working on
- World Ocean Atlas 2013 (about 700 file level granules), batch-processing time for all files about 5 minutes (after editing XSLT or adding additional info to NcML)
- C-MAN (NDBC Coastal-Marine Automated Network and moored weather buoys), about 10000 file level granules and counting, batch-processing time for all files about 10 minutes
- CO-OPS (Center for Operational Oceanographic Products and Services), about 10000 file level granules and counting, batch-processing time for all files about 10 minutes
- Quality-Controlled Underway Oceanographic and Meteorological Data from the Center for Ocean-Atmospheric Predictions Center (COAPS) - Shipboard Automated Meteorological and Oceanographic System (SAMOS), about 110000 file level granules, batch-processing time for all files about 10 hours
XML Linkchecker cmd toolWHAT DOES IT DO?
Check all http:// and https:// links in XML files in directory the tool is run in, report back the server responses received from the checked links and log the failed responses (everything other than "200 OK") to a linkchecker_bad_links_log file in the current working directory
PREREQUISITES
The program uses Bash and cURL commands under the hood and will not work if Bash and/or cURL are not installed in the environment it's run in
HOW TO RUN IT?
- Drop the binary suitable for your system (Mac or Linux) into the folder with the XML files you want to check for broken links
- Open a shell (PuTTY, Mac terminal, whatever) and navigate to the folder where you dropped the linkchecker binary
- Run "chmod -R 777 linkchecker"
- Run "./linkchecker" and if it works you should be able to see what it's doing in your shell and end up with a linkchecker_bad_links_log file in the current working directory after it's done
Etc.Tools I use at NOAA
Bash, Git, Go, Linux, Mac, Oxygen, PuTTY, Sublime Text, xsltproc, XML, XPath, XSLT, Windows
Goals
Further improve my workflows and programs, collaborate more and learn about other developers’ workflows
Wishes
Shared GitHub account for better code collaboration