Jake.codes

New Comic Book Releases as CSV

Monday, June 02, 2014

I made this for Keyboard Maestro to get a list of comic books coming out soon to put in my calendar. It uses a mix of JavaScript and Ruby to scrape and format as a CSV file.

Download Keyboard Maestro Macro

Available here!

Websites and Scrapes

Date of the Next Sunday

#!/usr/bin/ruby
require 'date'
# http://stackoverflow.com/questions/7930370/ruby-code-to-get-the-date-of-next-monday-or-any-day-of-the-week
def date_of_next(day)
date = Date.parse(day)
delta = date > Date.today ? 0 : 7
date + delta
end
puts date_of_next('sunday')

Scrape These Websites

Marvel, Next Weeks Releases

http://marvel.com/comics/calendar/week/

var output = "";
$('.row-item.comic-item').each(function () {
output += '{ "title" => "' + $('.meta-title',this).text().trim() + '", "date" => "' + $('h6').text().trim() +'", "price" => "", "source" => "marvel.com"},';
});
output;
Image, This Month and Next Month’s Releases

http://imagecomics.com/comics/upcoming-releases/[[YYYY]]/[[M]]
http://imagecomics.com/comics/upcoming-releases/[[YYYY]]/[[M+1]]

var output = "";
$('.release_box').each(function () {
output += '{ "title" => "' + $('h1',this).text().trim() + '", "date" => "' + $('.pub_date',this).text().trim() +'", "price" => "", "source" => "imagecomics.com"},';
});
output;
Comixology Subscription Next Release Date and Prices

https://www.comixology.com/my-account/subscriptions

var output = "";
$('.activeSubs .subscription').each(function () {
output += '{ "title" => "' + $('.title',this).text().trim() + '", "date" => "' + $('.releaseTime',this).text().trim() +'", "price" => "' + $('.price',this).text().trim() + '", "source" => "comixology.com"},';
});
output;
Comixology Pull List’s Next Week Releases

http://pulllist.comixology.com/nextweek/?limit=5000&viewby=image

var output = "";
$('#imageView td').each(function () {
output += '{ "title" => "' + $('#title',this).text().trim() + '", "date" => "' + $('#results h2 strong').text().trim() +'", "price" => "", "source" => "pulllist.comixology.com"},';
});
output;

Building CSV, Sorted and Filtered

#!/usr/bin/env ruby

require 'date'

comics = eval(ENV['KMVAR_comicsAll'])

regexMatch = /((X-MEN)|(SCOTT PILGRIM)|(LOCKE & KEY)|(HAWKEYE)|(FATALE)|(SAGA)|(RUNAWAYS)|(SCARLET)|(AMERICAN VAMPIRE)|(HIGH CRIMES)|(UNCANNY X-MEN)|(NOWHERE MEN)|(EAST OF WEST)|(X-FORCE)|(THE DARK TOWER)|(SAVAGE WOLVERINE)|(YOUNG AVENGERS)|(BANDETTE)|(CAPTAIN MARVEL)|(FLASH)|(DAREDEVIL)|(GUARDIANS OF THE GALAXY)|(SEX CRIMINALS)|(PRETTY DEADLY)|(AVENGERS ASSEMBLE)|(BITCH WORLD)|(ODY-C)|(THE FADE OUT)|(CASANOVA)|(ALL NEW ULTIMATES)|(ALL-NEW ULTIMATES)|(BLACK WIDOW)|(MS MARVEL)|(MS. MARVEL)|(ALL NEW X-FACTOR)|(ALL-NEW X-FACTOR)|(AMAZING SPIDER-MAN)|(AMAZING SPIDER MAN)|(MAGNETO)|(ORIGINAL SIN)|(SECRET AVENGERS)|(SHE-HULK)|(SHE HULK)|(NIGHTCRAWLER)|(WONDER WOMAN)|(THE WAKE))/i
regexBlock = /((ALIEN SAGA)|(ASTONISHING X-MEN)|(BOJEFFRIES SAGA)|(CATACLYSM ULTIMATE X-MEN)|(CLASSIC MARVEL CHARACTER)|(CLOWN FATALE)|(DOCTOR WHO)|(ELITE SAGA)|(EXCEL SAGA)|(FIUMARA)|(FLASH GORDON)|(GUMPS SAGA MARY)|(HANDLE MUG)|(JUSTICE LEAGUE)|(LEGION OF SUPERHEROES GREAT DARKNESS SAGA)|(MARADA SHE WOLF)|(MARVEL UNIVERSE AVENGERS ASSEMBLE)|(SAGA OF THE SWAMP)|(SAGAT)|(SAINT SEIYA)|(SCARLET SPIDER)|(SONIC SAGA)|(STAR WARS)|(SUPERMAN)|(THE SCARLET BLADES)|(TOON TUMBLERS)|(TWILIGHT FOREVER)|(TWILIGHT SAGA)|(VINLAND SAGA)|(YU GI OH))/i

# Get the next day of the week for a given date
# http://stackoverflow.com/questions/7930370/ruby-code-to-get-the-date-of-next-monday-or-any-day-of-the-week
def date_of_next(day,date)
date = Date.parse(day)
delta = date > date ? 0 : 7
date + delta
end

# Straighten up data, parse dates
comics.each do |comic|
next if comic["date"] == "TBD"

comic["title"] = comic["title"].gsub(',','').gsub(':','').gsub('&','and').gsub(/ \(.+?\)/i, "")

if comic["date"].include?('Expected Delivery is ') then
comic["date"] = comic["date"].gsub('Expected Delivery is ','')
end

if comic["date"].include?(' - ') then
comic["date"] = date_of_next('Wednesday',Date.parse(comic["date"].gsub(/ - .+$/, "")))
else
comic["date"] = Date.parse(comic["date"])
end
end

# Filter down to the comics I want
comics = comics.select{ |comic| regexMatch.match(comic["title"]) }
comics = comics.reject{ |comic| regexBlock.match(comic["title"]) }

# Remove 2nd printings, volumes, hardcopies, and trade paperbacks
comics = comics.reject{ |comic| /(HC|TP|Vol\.|2nd Ptg|3rd Ptg|\dth Ptg|Poster)/i.match(comic["title"]) }

# Remove Dupelicates and Sort
comics = comics.sort_by { |comic| [comic["date"].to_s,comic["title"],comic["source"]] }
comicsCount = comics.size
comics.each_with_index do |comic, index|
next if index == (comicsCount-1)
next if comic["title"] != comics[index+1]["title"]
next if comic["date"] != comics[index+1]["date"]
next if comic["source"] == comics[index+1]["source"]

combinedSource = comic["source"] + ";" + comics[index+1]["source"]
comics[index+1]["source"] = combinedSource
comic["source"] = combinedSource
end
comics = comics.uniq

# Fix for Comixology Subscription data without Issue Numbers that could be duped
comicsCount = comics.size
comics.each_with_index do |comic, index|
next if comic["price"] == ""
next if index == (comicsCount-1)
next if comics[index+1]["title"].include?(comic["title"]) == false
next if comic["date"] != comics[index+1]["date"]

comics[index+1]["price"] = comic["price"]
comic["title"] = comics[index+1]["title"]

# could return a false positive if it is being asked if pulllist.comixology.com contains comixology.com, but
# since I sorted above, that direction probably won't occur or cause any problems if it were to occur.
if (comic["source"] != comics[index+1]["source"]) && (comic["source"].include?(comics[index+1]["source"]) == false)
combinedSource = comic["source"] + ";" + comics[index+1]["source"]
comics[index+1]["source"] = combinedSource
comic["source"] = combinedSource
end
end
comics = comics.uniq

# Output as CSV if it has a date that is not before today.
comicCSV = ""
comics.each do |comic| 
next if comic["date"] == "TBD"
next if comic["date"] < Date.today
comicCSV += comic["title"] + "," + comic["price"] + "," + comic["date"].to_s + "," + comic["source"] + "\n"
end
print comicCSV