Note: this project is a few years old, but serves as the germ of the idea for mute.land. Austin worked with David Rheams (a frequent collaborator) to create this system.

The Text Parsing and Analytics Tool is a lightweight text analysis application designed to assist researchers performing content analysis on newspaper articles. This is useful for researchers who are analyzing numerous sources of newspaper articles. The application takes text files and converts them to a table for analysis.

The application serves three main purposes:

  1. Import Articles: Gather text output from discrete sources (WestLaw, LexusNexus, ProQuest) and convert the batch of articles in the text file to a table.
  2. Run Query A: Run a regular expression on the body text of an article to locate syntactical distinctions - the output is a list of keywords which match the criteria determined by the regular expression
  3. Run Query B: Takes a list of keywords generated and provides additional context in a CSV file. Additional data includes:the article ID of the article that contained the keywordThe total count of the keyword across30 characters surrounding the keyword

Current state of software

  • This is an early stage application with minimal security. We recommend connecting the text parsing and analytic tool to a local database rather than a server.
  • When parsing Westlaw and ProQuest documents, tab separated value (TSV) output will be generated suitable for entering into the main database ("Master List"), because of the stage of development only LexusNexus commits to the SQL Database

Installation Instruction

  • Download this repository
  • Install "/TpaTool" folder on your server
  • Create a mySql database, user and password
  • open "/TpaTool/settings.php" and enter your database, user and password information
  • Save and run

License

Attribution-NonCommercial-ShareAlike 4.0 International


Code Snippets & Queries:

Query A Example:

The application allows a user to insert whatever regular expression they need. The default regular expression locates two or more consecutive capitalized words (proper nouns).

//Reg Ex Default:
([A-Z][a-zA-Z0-9-])([\s][A-Z][a-zA-Z0-9-])+

Query B Example:

Allows a user to past in the keywords generated by Query A and provides additional context in a CSV file (the group by SQL command is configurable).

//Query:
foreach $keyword '' 
    SELECT SELECT Pub_Date, '' count(*) '' 
    FROM '' Master_List '' 
    WHERE '' match(Article) against('$keyword') '' 
    GROUP By // THIS MAY CHANGE '' Pub_Date;

Authors

  • Code Development: Austin Meyers (AK5A)
  • Research & Query: David Rheams (Dr-Heams)