Spring 2025 Syllabus (Schedule)
Classes meet M W F 10:10 - 11am in Lilley Library 006.
This contains a detailed explanation of course policies and the basis for grades.
This link jumps to the closest day to today's date. Review the schedule as we get
started to get a sense of how this course will work on a daily basis.
All the Tools You Need As We Begin:
Download and install the following software on your own personal computer(s) on or
before the first day of class. These software tools are available in our campus computing
labs, too.
- <oXygen/>.
(You will probably have this installed from DIGIT 100 or 110.) The DIGIT program has purchased a site license for this software, which
is installed in Burke 153, Kochel 77, the Lilley Library computers, and Witkowski 109, as well as the computer labs in Hammermill. The license also permits
students enrolled in the
course to install the software on their home computers (for course-related use
only). When installing this on your own computers, you will need the
license key, which we have posted on our course Announcements section of
Canvas.
- AntConc: (You may have this installed from DIGIT 100.)
Free corpus text analysis tool.
- We will ask you to install Python version 3.8 or
higher on your computer, and install PyCharm Edu to assist in learning and
writing Python code with syntax checking. Follow instructions and links from
Pycharm ( https://www.jetbrains.com/help/pycharm/quick-start-guide.html#meet ) paying attention to what you need for your own computer systems.
Feel free to download and explore Pycharm Edu on your own before we start
working with it together: https://www.jetbrains.com/pycharm-edu/. Also, configure
Anaconda so it is available to work within Pycharm following this guide: https://www.jetbrains.com/help/pycharm/conda-support-creating-conda-virtual-environment.html. (We will provide guidance on this in class.)
- Zoom: Make sure your Zoom installation is up-to-date, and you are ready to
connect. Sometimes we will record portions of class meetings and tutorial sessions for future reference to share over Zoom. Look for these in Canvas Announcements and use the Zoom menu option in Canvas to access these meetings.
- We will use GitHub for for sharing code and for project management. Create an account (choose the free options) at the https://github.com and install the GitHub client software for your operating system on your own machine on your computer. (We will explain how to use git and GitHub this in our course.)
- We will use the Slack chat platform for discussion and for asking questions (see https://slack.com/help/articles/218080037-Getting-started-for-new-members). Download and install the Slack client, configuring your account to use use your Penn State email address (the official address, which looks like xyz123@psu.edu, and not an alias based on your name that you may have set up), so you can join our Slack workspace: DIGIT-coders. When you receive an invitation to join this workspace you should accept.
- Later in the semester we may ask you to install a local copy of the eXist-db
XML database, which you can download from https://exist-db.org/.
- Not much coding experience? Don’t worry! Past students
in this course
who never saw anything like markup or XML code have designed projects (like these) and even spoken about them at academic conferences! You will learn to develop
your own digital tools and how to manage digital projects as teamwork.
Class Web Resources:
Week 1 |
Class topics
|
Do before class
|
M 01-13
|
- Welcome! Intro to the course and theme of text analysis and re-mediation, and visualization.
- Hands-on warm-up with Scaleable Vector Graphics: SVG in oXygen XML Editor.
- Genuary activities.
|
Respond to Dr. B’s Canvas announcement, install/update oXygen XML Editor. |
W 01-15
|
Data to numbers to shapes with legible, human-readable SVG code. |
- Install/update oXygen XML Editor if you have not done so already.
- SVG Exercise 1: Orientation
- Join / reactivate the Digit Coder's Slack
|
F 01-17
|
- Class protocols for handling code files: GitHub and version controlled file management. Making a branch on the textAnalysis-Hub. Review adding, pulling, adding, committing, and pushing.
- Gentle XPath orientation / review: Pulling data from Digit 110 projects to plot in SVG
|
|
Week 2 |
Class topics
|
Do before class
|
M 01-20
|
Martin Luther King Day: No classes. |
... |
W 01-22
|
Orientation: Programming your visual design: XSLT / XQuery to SVG |
|
F 01-24
|
- Programming your visual design: XSLT / XQuery to SVG.
- Contemplating the flow of text to image via code.
|
SVG Exercise 3 |
Week 3 |
Class topics
|
Do before class
|
M 01-27
|
- Improving designs / layout on websites: XSLT / XQuery to SVG
- Preview Regular Expressions (Regex) unit
|
SVG Exercise 4 |
W 01-29
|
- Structuring and regularizing data from documents with markup.
- Introduce document analysis with Regular Expressions: the dot, the backslash, numbers (
\d , repetition indicators, matching on lines, and autotagging. Greedy and non-greedy matching.
- Preview Intro to Regular Expressions
- Choosing a license for your project GitHub repo.
|
- Watch Regex Orientation Videos:
- Regex Exercise 1
|
F 01-31
|
- Regular Expressions: Thinking (and writing) in markdown, algorithmically.
The fine art of
Looking Stuff Up in the Regex tutorials:
Character sets, symbols, capturing groups.
|
Regex Exercise 2. As you work on this, consult our Intro to Regular Expressions and the Regular Expressions Quick Start. |
Week 4 |
Class topics
|
Do before class
|
M 02-03
|
Regex greedy and non-greedy matches. |
- Regex Exercise 3
- (By the end of the day): Five Days of Git: Part 1: Record completion on Canvas as part of GitHub Test
|
W 02-05
|
Regex and project applications / ideas. |
- Regex Exercise 4
- Five Days of Git: Part 2: Record completion on Canvas as part of GitHub Test
|
F 02-07
|
- Semester project ideas
- Validity for a project: what is a schema? What is schema validation?
- Validation for Google Sheets
- How to write a Relax NG schema (review for some / intro for others)
|
- Five Days of Git: Part 3: Record completion on Canvas as part of GitHub Test
|
Week 5 |
Class topics
|
Do before class
|
M 02-10
|
- Introduce Regex Test
- Good projects: ideas, sources, teamwork expectations: discussion
- Relax NG: data types and mixed content
- Troubleshooting and debugging Relax NG
|
- Relax NG Exercise 1 (before class)
- Five Days of Git: Part 4: Record completion on Canvas as part of GitHub Test
|
W 02-12
|
- Relax NG schemas for project management
- Project ideas
|
- Relax NG Exercise 2 (before class)
- Five Days of Git: Part 5: Record completion on Canvas as part of GitHub Test
|
F 02-14
|
- Review / discuss project proposal assignment
- Relax NG: Common problems (mixed content, repetition indicators). Simplifying your code. Documenting your schemas
|
- Respond to Requirements to Initiate Semester Projects
- Project proposals part 1: Post proposal ideas in Slack Project Proposals + Discussion
- Relax NG Exercise 3
|
Week 6 |
Class topics
|
Do before class
|
M 02-17
|
Form project teams! |
- Project proposals part 2: Respond to Slack Project Proposals + Discussion before class
- Complete Regex Test
|
W 02-19
|
- GitHub Pages review / Project websites
- Building project text corpora: Resources and approaches to
scraping
- Copyright, proprietary ownership, legality issues
|
Project Milestone 1: Set up Project GitHub + Slack channel,
arrange regular team meeting times |
F 02-21
|
- Introducing Invisible XML (ixml): crafting your own grammars
- ixml and XProc pipeline demonstration
|
|
Week 7 |
Class topics
|
Do before class
|
M 02-24
|
- Checking / troubleshooting Pycharm and Python installations
- Pycharm Edu tutorial work together. Manipulating strings wtih Python, and Pythonic data structures (lists, tuples, dictionaries).
|
- Pycharm Edu tutorials: through Strings unit (submit evidence of completion via screen capture on Canvas).
|
W 02-26
|
- Python tutorial Q/A: tinkering.
- Python at command line vs. in the Pycharm IDE (or oXygen, VS Code, etc)
|
- Pycharm Edu Community tutorials: Complete the Tutorial through the Condition expressions unit (submit evidence of completion via screen capture on Canvas).
|
F 02-28
|
Project team workday |
- Work on the Pycharm Edu tutorials: Get at least partway through Classes and Objects unit.
- Project Milestone 2
|
Week 8 |
Class topics
|
Do before class
|
M 03-03
|
Class on Zoom: Topic [TBD]: Python introduction Q/A, demonstration. Preparation / methods for curating and data modeling of texts. What can we do / access with Python methods?
Dr. B is attending a Symposium in Tokyo this week.
|
- Finish Pycharm Edu Intro to Python tutorials: Classes and
objects, Modules and packages, File input and output. Submit evidence of completion via screen capture on Canvas.
|
W 03-05
|
Class on Zoom: Topic [TBD]:
- Invisible XML (ixml) for document analysis. Setting up XProc (demonstration / application)
- Building a pipeline with Python or with XProc
|
ixml Exercise 1 |
F 03-07
|
Class on Zoom: Topic [TBD]: Eliminating ambiguity in ixml. Will ixml work in my project? |
ixml Exercise 2 |
Sun 3-09 - Sat 3-15 |
Spring Break |
Enjoy this week! |
Week 10 |
Class topics
|
Do before class
|
M 03-17
|
- Orientation to Natural Language Processing and how AI works.
- Python: Review working with libraries (modules and packages), saving and executing files.
- Writing your own Python: creating and sharing an environment on GitHub
|
[TBD] Readings on AI, large language models, word embeddings. |
W 03-19
|
- Ethical issues and research paths with AI, NLP, word embeddings. Where do we find arts and humanities in computational processing?
|
AI / NLP Exercise 1: Preparing sample generative AI outputs for similarity analysis. |
F 03-21
|
Writing your own Python: Getting started with Natural Language Processing (NLP) with Python:
installations/imports: nltk, spaCy, gensim |
AI / NLP Exercise 2: Reading sample AI output text files, accessing vector data. |
Week 11 |
Class topics
|
Do before class
|
M 03-24
|
- Word embeddings and the concept of
cosine similarity : a humanities perspective
- NLP and large language models, vs. customized, specialized modeling.
|
AI / NLP Exercise 3: comparing sets files using word vector data: (words similar to a word of interest) |
W 03-26
|
Introducing Python web scraping with Beautiful Soup and LXML e-tree |
Python exercise / orientation to scraping libraries |
F 03-28
|
- Revisiting pipelines: scraping, cleaning, preparing outputs
- NLP: Named Entity Recognition (NER), sentiment analysis: problems and possibilities
|
Python exercise 1: web scraping + NER and/or sentiment analysis |
Week 12 |
Class topics
|
Do before class
|
M 03-31
|
- NER, sentiment analysis: benefits and problems
- [TBD] Introducing Topic Modeling with nltk (LDA) or spaCy
- Readings / examples re limits of NLP libraries, language / time barriers
|
Python exercise 2: web scraping + NER and/or sentiment analysis continued
|
W 04-02
|
Topic modeling code and visualization |
Python exercise 3: starting topic modeling |
F 04-04
|
Visualizing / troubleshooting / problem solving. Applying to projects. |
Python exercise 4: tinkering with / visualizing topic modeling |
Week 13 |
Class topics
|
Do before class
|
M 04-07
|
Moving between unstructured and structured documents for data modeling.
XQuery, eXist-dB, and Python pipelines. |
eXist-dB and Cytoscape project prep: Installations |
W 04-09
|
- XQuery and Python methods: Network Analysis vs. Topic Modeling
- XQuery FLWOR statements
- XQuery to TSV or JSON for network analysis
|
Network Analysis Exercise 1: structured data extraction |
F 04-11
|
- XQuery work on FLWOR statements
- Network statistics: degree, closeness, eigenvector centrality measures. Path steps. The concept of eccentricity and distance.
|
Network Analysis Exercise 2: Cytoscape import / visualization |
Week 14 |
Class topics
|
Do before class
|
M 04-14
|
Network Analysis: Debugging the source files via visualization |
Network Analysis Exercise 3: Refining and exporting network visualizations |
W 04-16
|
- Introduce Python / XQuery Test
- Schematron, Python Assertions, and other debugging methods
|
Looking for trouble: Project bug-finding exercise |
F 04-18
|
Python and XML handshake: Saxon C Library: XPath, XSLT, XQuery in Python |
... |
Week 15 |
Class topics
|
Do before class
|
M 04-21
|
- Project documentation and reflection: What do you know? What is not certain? Documenting the limits.
- Mermaid.live / markdown to flow diagrams
|
|
W 04-23
|
Return to SVG: Project visualizations |
Documentation: Flow diagram exercise |
F 04-25
|
Catch-up day. |
Python / XQuery Test |
Week 16 |
Class topics
|
Do before class
|
M 04-28
|
Putting it all together: Discussion, analysis, documentation, web work. Ethics in public-facing digital data representation. |
Project development sprint, prep for DIGIT Works presentation |
W 04-30
|
Team sprint day in class |
Project development sprint, prep for DIGIT Works presentation |
F 05-02
|
Last Day! Project Milestone: Teams deliver DIGIT Works presentations |
Prep for presentations |
Finals Week: May 5 - 9
|
To Complete
|
W 5-07
|
Semester projects due by 11:59pm
Finish developing projects, and send a post to me on GitHub and Canvas to
indicate your team is finished.
|