Nerdy Yet Awesome

Intro

Here's something that I cooked up a little while back, then found a further use for, then found an even greater use; it was an evolutionary project that sort of took off and helped me to learn a few things and adopt some new tools, which is always nice.

Building on what I've been talking about recently, between a greater understanding of front-end development tooling and practices along with a skill set that reaches outside our day-to-day work, here's an interesting thing I did recently to solve a unique problem.

Ultra Nerdy

How nerdy was it? We'll get to that by the end and I'll let you be the judge. The problem I was faced with was that I was working on reconciling some content for a side project, which is in markdown (md). The content I needed to update was a number of emojis which had been rendered via a plugin, which would recognize the short name inside of a colons. For example :smile: which becomes 🙂, as you can see in this blog post's source vs the rendered content here. The plugin isn't available for the destination, but like all Markdown, will render in HTML. While I may be packaging what I've done up for a plugin for that environment now, the same functionality wasn't there when I started, so I had to DIY.

Ultimately, my task was to:

  • scan my existing files for occurrences (such as :beers:, which happens a surprising amount)
  • register what needed to be replaced
  • replace with a corresponding HTML image tag (or the md image markup, with an img tag and a src attribute pointing to a copy of an image file corresponding to the short code src="https://path/to/beers.png"`)

Something Neat

The side project I'm working on is collecting my better blog posts into an eBook format. This is mostly to be able to say I've done so, and when that finally hits, it'll be freely available in all major eBook formats. If you keep referencing my blog, that's the primary source, so no worries there.

GitBook is a project that Jesse Gallagher turned me on to. It's a full service that connects to your GitHub account and automates some of what I'm doing, but I went the DIY route and am merely using their CLI implementation (gitbook-cli), available via npm, as outlined in the readme of their GitHub repository. Basically, when I update the git repository that is the "book" version of the first year of my blog, my Jenkins instance pulls in the latest changes, runs the appropriate npm install (and other affiliated plugin install tasks), then runs the commands to generate the appropriate formats. As a bit of a convenience, so I don't have to wait around for anything, it automatically publishes them into a folder in my Dropbox account (which is token authorized). I even have the files being named according to the build number.

Again, this was just something to force myself to get familiar with some tools that I'm looking to do more with, and I'm quite happy with that result, as I've never felt better about my Jenkins-fu and GitHub or Bitbucket webhooks.

Bringing It All Together

To achieve my results, I needed to do scan multiple files (all in a given directory) of a particular type (or at least with a given extension), check their contents with what would be ultimately a RegEx test for whether or not to perform the replace tasks as necessary, along with saving them back to the origin file with the changes. I've done some BASH scripting before and even a little Windows batch work, but I wanted something I was familiar enough to be fluent with, with minimal effort; this is, after all, nothing more than a "glamor" project, so the KISS approach was ideal. I settled on writing a NodeJS based script, seeing how I'm already using Node for the project and have a set of dependencies that could easily tie into the package.json as development dependencies. Better yet, Node includes an out of the box file system API which is well documented.

The RegEx

Coming up with the correct RegEx is what took the longest. There are over 800 separate emojis supported on GitHub. The plugin I am using on my blog is the GitHub Pages available jemoji. GitHub Pages doesn't have every Jekyll plugin available, but it's been growing steadily since my blog started (from around 3 then to several now).

Ultimately, not caring about the simple "emojis" (such as :-D), rather just the true emoji short names, I started by examining the entire sample set of emoji short names. GitHub has an undocumented API (really just a JSON reference) at api.github.com/emojis. This let me know that all the emoji short names I would potentially have to match (even though I've only used certain ones), would be a list of effectively word characters immediately surrounded by colons.

To make a long story short (ending with a space after the colon, or sentence ending character and accounting for a + or - explicit character), you can find my full RegEx test on regex101.com. The expression I settled on is:

/(\:(\w|\+|\-)+\:)(?=\s|[\!\.\?]|$)/gim

Hopefully that makes sense after my description above, if not, check out the RegEx101.com link, as they have some helpful tools on the side that explain character matching and match successes, etc.

The Images

With my ability to receive an object with keys identifying all possible emoji short names and a corresponding URL, all seemed good and well, until I realized that I was still going to need an offline copy of the image, as my eBook renderings were completing before the response of all images had completed. So I scripted out another task which I won't get into here, that saved a copy of each image into a local folder, named according to a <shortName>.png format. Finally, I had all the components I needed.

Creating Running the Script

Something I ran into while creating and testing the script was that Node's async nature was processing the file, inside the nested callback function, at a different timing that I had anticipated. While this can seem counterintuitive, it's part of the non-blocking i/o nature of Node; I was able to switch over to using a synchronous version of the same functions for fs.readdirSync(path) and fs.readFileSync(file[, options]), though I kept fs.writeFile(file, data[, options], callback) async, as I didn't need to wait around for the file write to complete before continuing processing. For a quick idea of how this is beneficial, this SlideShare deck seemed mildly worthy; although this is a huge topic in and of itself.

Before I show you the full version, here's a basic overview of both how my script is structured and the async nature of Node.

#!/usr/bin/env node
var util = require('util'),
fs = require('fs');
console.log('starting');
fs.readdir('./', function(err, files){
console.log('searching for .md');
files.filter(function(file){ return file.substr(-3) === '.md'; })
.forEach(function(file){
console.log('reading a .md file'+file);
});
});
console.log('done');
view raw test.js hosted with ❤ by GitHub

You may take note of the first line, which is #!/usr/bin/env node and points to my local node binary (according to its being picked up by the environment), a.k.a- the "hash bang". This is akin to how one might specify a shell script, such as #!/bin/sh. Basically, so long as you make the file executable, you can run a node script as if it's a shell script, since the shell script starts by pointing at what interpreter to use; it's perfectly legitimate! This is ultimately just a nifty thing, and one should take care as not to use any packages that might not be able to be used from a globally installed context, as most people don't like random node_modules directories strewn about their file systems. It's alternately equivalent to invoking the same script via node script.js as opposed to ./script.js or sh script.js.

Here's the full thing, in its original form:

#!/usr/bin/env node
var util = require('util'),
fs = require('fs'),
request = require('request'),
argv = require('minimist')(process.argv.slice(2)),
filePath = argv._[0],
imagePathPrefix = '/images/emoji/',
emojisUrl = "https://api.github.com/emojis",
emojisOb = {},
options = {
url: emojisUrl,
headers: {
'User-Agent': 'Awesome-Octocat-App'
}
},
re = /(\:\w+\:)(?=\s|[\!\.\?]|$)/gim;
console.log('starting');
request(options, function (error, response, body) {
if (!error && response.statusCode == 200) {
var fCt = 0;
var mCt = 0;
emojisOb = JSON.parse(body);
var fileNameAr = fs.readdirSync(filePath);
for( var i=0; i<fileNameAr.length; i++ ){
var curVal = filePath+fileNameAr[i];
if( curVal.substr(-3) === '.md' ){
var file = curVal;
var contents = fs.readFileSync(file, 'utf-8');
//fs.readFile(file, 'utf-8', function(err, contents){
fCt++;
if( re.test(contents) ){
console.log('match found in '+file);
mCt++;
var result = contents;
var foundMatch = false;
for( var prop in emojisOb ){
if( contents.indexOf(':'+prop+':') > -1 ){
foundMatch = true;
console.log('found a match for '+prop+' in '+file);
var nwRe = new RegExp(":"+prop+":","gi");
result = result.replace(nwRe, '<img src="'+imagePathPrefix+prop+'.png'+'" alt="'+prop+'" style="height:auto;width:21px;">');
}
}
if(foundMatch){
fs.writeFile(file, result, 'utf-8', function(er){
if(er){
console.log('error: '+er);
}else{
console.log('writing file back with updates')
}
});
}
}
//});
}
}
console.log('found '+fCt+' .md files and '+mCt+' emoji short name occurrences');
}else{
console.log('error getting '+emojisUrl+', response status code: '+response.statusCode);
console.log('nothing to do, exiting');
}
});
console.log('done');
view raw handleEmojis.js hosted with ❤ by GitHub

Effort For Gain

In the end, I had something around 40 occurrences of emoji short names, so this may not have saved me much more work than I could have done with something else, but it was worth the experience to get more familiar with how to do such a thing in Node and also for what is in the next section.

Tying Into Build Task Jenkins

As you may have caught on from my blog series, I'm a big fan of task runners. Jenkins CI, another tool I have a great love for, is essentially a highly configurable task runner (and more) in its own right. It's also a great tool for build automation and, if it hasn't hit yet, is the subject in another blog post in the task runner series of mine.

As for how to hook this into my Jenkins process, I ultimately am running a shell invocation for my "build" process. I could have multiple, but for this task, it's relatively uncomplicated, so it's just one. There are those in the camp that all build tasks should be contained in individual shell scripts, so that all the Jenkins configuration does is invoke the shell script, which has the advantage that it can be maintained independent of Jenkins, but I find it easy enough to log into my Jenkins instance to do so; this is one of those things that everyone will have a preference for, so go with what works for you.

To add it into my build task, after calling my npm install (and gitbook install) but before my building of my eBook files (a), I

One Step Further

Since I had created something neat I hadn't seen before (my search for an existing npm package was negligible for my purposes), I was able to tackle a small challenge in an environment I was previously less knowledgable in. It also gave me an opportunity to try out something else new in a more in-depth fashion; yeoman's generator-node. For those that have followed my blog series on task runners, caught my recent IBM Connect session in-person or the pending release of my Notes in 9 of my session's highlights, you may be aware that I've mentioned that when it comes to yeoman, there seems to be a generator for nearly everything. Using generator-node, I was able to fairly quickly scaffold out a full project that's a nicely contained npm package which is installable from the npm registry, contains a (server) module for use via a require statement in a JavaScript context, or as a command via the cli. It even has unit tests, continuous integration via travis-ci, dependency checking via david-dm, code coverage reporting via codecov, and... you get the picture, just check the readme's badges at the top.

NPM

Another Step Further

This is an update to the original article, expanding on some further developments that didn't quite warrant their own post.

As it turns out, by the time I got around to (finally) creating the gitbook-plugin- version of this script, someone else had created one to do something similar, using emojify.js; which is another, well maintained package. All in all, I would probably have used that, had it not been for the fact that I wrote the above. For those interested, I checked and their initial commit is 6 days after I started down this path, originally.

I also grew tired of needing a Jenkins CI instance to do what I could do for free using one of the multitude of options available to an open source GitHub repository, which was always going to be the end destination for the originating project. That being said, I created a Travis CI task for the repository, which does what the Jenkins CI task did, but instead publishes the static site (of the ebook format, from gitbook-cli) to a separate, gh-pages branch, and the built ebook files to a built branch.

Here's the .travis.yml configuration file. The reason it got larger than expected was due to the need to resolve the dependency of calibre, which wasn't working as I expected in the legacy vm. You can also notice that I'm using a GitHub Personal Access Token exposed (privately) to Travis CI as an environment variable for authenticated access, without exposing my credentials.

language: node_js
sudo: required
dist: trusty
node_js:
- v4
cache:
directories:
- node_modules
# whitelist
branches:
only:
- master
before_script:
- sudo add-apt-repository ppa:ubuntu-toolchain-r/test -y
- sudo apt-get update -qq
- sudo apt-get install -qq libyajl-dev libxml2-dev libxqilla-dev
- if [ "$CXX" = "clang++" ]; then sudo apt-get install -qq libstdc++-4.8-dev; fi
- if [ "$CXX" = "g++" ]; then sudo apt-get install -qq g++-4.8; fi
- if [ "$CXX" = "g++" ]; then export CXX="g++-4.8" CC="gcc-4.8"; fi
- sudo -v && wget -nv -O- https://raw.githubusercontent.com/kovidgoyal/calibre/master/setup/linux-installer.py | sudo python -c "import sys; main=lambda:sys.stderr.write('Download failed\n'); exec(sys.stdin.read()); main()"
- npm i -g node-gyp && node-gyp clean
- npm i -g svgexport
script:
- sudo chown -R $USER ~/
- ./node_modules/emoji-transmogrifier/src/cli/index.js zap
- gitbook install
- gitbook build
- gitbook pdf . build/DevBlog_Year1.pdf
- gitbook mobi . build/DevBlog_Year1.mobi
- gitbook epub . build/DevBlog_Year1.epub
after_success: |
if [ -n "$GITHUB_API_KEY" ]; then
rm -rf .git
git config --global user.name "travis"
git config --global user.email "travis"
cd _book
git init
git remote add origin https://$GITHUB_API_KEY@github.com/edm00se/dev-blog-book
git checkout -b gh-pages
git fetch
git add .
git -c user.name='travis' -c user.email='travis' commit -m 'Travis CI Deploying'
git push -f origin gh-pages
# Make sure to make the output quiet, or else the API token will leak!
# This works because the API key can replace your password.
cd ..
cd build
git init
git remote add origin https://$GITHUB_API_KEY@github.com/edm00se/dev-blog-book
git fetch
git checkout -b built
git add .
git -c user.name='travis' -c user.email='travis' commit -m 'Travis CI Deploying'
# Make sure to make the output quiet, or else the API token will leak!
# This works because the API key can replace your password.
git push -f origin built
cd ..
fi
view raw .travis.yml hosted with ❤ by GitHub

Summary

All in all, I learned a few things, found a solution to a problem I had, and grew from the experience. I call this a "win". Hopefully this can inspire some of you to give something new a try, as it never hurts to expand the skill set.