Jake.codes

WhatIsThat.py

Thursday, February 19, 2015

This is a super simple regular expressions test to see if I could safely indentify text as either:

  1. a calendar event
  2. an email address
  3. a url
  4. an location
  5. a phone number

Once I have indentified it, I can then act upon it. I wrote this up in Python and Regex so I can make a little clipboard monitoring script on my iPhone. For example, if I have an address in my clipboard, I probably want directions on how to get there. If I have something with a date or time, I probably need to schedule it in my calendar. I’m still working on that part, but I feel like this is a handy starting point.

import re

query = "My Query Here"

regexes = [
    (
        'datetime',
        '((\d\d?:\d\d|\d+(a|p)m?)|\d{2,4}[- /.]\d{2}[- /.]\d{2,4}|(January|Jan|February|Feb|March|Mar|April|Apr|May|June|Jun|July|Jul|August|Aug|September|Sep|Sept|October|Oct|November|Nov|December|Dec|\d\d?) \d+(st|th|rd)?)'
    ),
    (
        'email',
        '[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}'
    ),
    (
        'url',
        '(\w+://(\w+)?|\d+\.\d+\.\d+\.\d+|\w+\.\w+)'
    ),
    (
        'location',
        '(Avenue|AVE|Boulevard|BLVD|Circle|CIR|Court|CT|Drive|DR|Freeway|FRWY|Highway|HWY|Lane|LN|Mews|Parkway|PKWY|Pike|Road|RD|Street|ST|Trail|TRL|Way|Alabama|AL|Montana|MT|Alaska|AK|Nebraska|NE|Arizona|AZ|Nevada|NV|Arkansas|AR|New Hampshire|NH|California|CA|New Jersey|NJ|Colorado|CO|New Mexico|NM|Connecticut|CT|New York|NY|Delaware|DE|North Carolina|NC|Florida|FL|North Dakota|ND|Georgia|GA|Ohio|OH|Hawaii|HI|Oklahoma|OK|Idaho|ID|Oregon|OR|Illinois|IL|Pennsylvania|PA|Indiana|IN|Rhode Island|RI|Iowa|IA|South Carolina|SC|Kansas|KS|South Dakota|SD|Kentucky|KY|Tennessee|TN|Louisiana|LA|Texas|TX|Maine|ME|Utah|UT|Maryland|MD|Vermont|VT|Massachusetts|MA|Virginia|VA|Michigan|MI|Washington|WA|Minnesota|MN|West Virginia|WV|Mississippi|MS|Wisconsin|WI|Missouri|MO|Wyoming|WY)'
    ),
    (
        'phone',
        '[0-9-,p\*\#]{5,}'
    ),
    (
        'unidentified',
        '.+'
    )
]

for type, regex in regexes:
    if re.search(regex, query, re.I | re.M) is None:
        continue
    else:
        print "%s: %s" % (type, query)
        break