Friday 10 April 2020

Hmmm, cURL, why did you stop responding ?

So, a few weeks back, I was working on a Bash script that uses cURL to drive a set of REST APIs, and parses the responses, separating out the HTTP response code e.g. 200, 201, 301, 403 etc. from the actual response body.

This allowed me to write conditional logic based upon the return code, using IF/ELSE/FI.

This worked, and did the job for which it was intended, and everyone was happy .....

Fast forward a few weeks, and ... the script starts breaking ...

When I dig into the WHAT and WHY, I realise that the HTTP response codes are no longer being parsed from the response, even though the response still includes them.

After some serious digging around, including a major refactor of my code, I think I've worked out what went wrong ....

I don't necessarily know WHY, but I'm gonna blame the REST APIs, working on the assumption that my code didn't change, and therefore something outside my control did ....

At one point, I even wondered whether Bash on my Mac had changed, perhaps as a result of the latest macOS updates, but I was then able to reproduce the problem on an Ubuntu VM, so kinda suspect that Bash is NOT the villain.

Rather than sharing the full script here, I've written a prototype that demonstrates the problem, and then the solution.

So here's the bit of code that fails: -

curl_output=$(curl -s -k http://www.bbc.com --write-out "|%{http_code}")

IFS='|' read -r RESPONSE HTTP_CODE <<< "$curl_output"

echo -e "Response is" $RESPONSE "\n"

echo -e "HTTP Code is" $HTTP_CODE "\n"

Now, to break it down a bit, we've got four lines of code; the first merely calls the BBC website, which actually responds with HTTP 301 ( redirect ).

This grabs the HTTP response code, and appends it to the end of the string, captured within the curl_output variable, prepended by a vertical bar ( | ) separator.

For reference, the HTTP response code is obtained using the cURL parameter --write-out "|%{http_code}".

The second line of code does two things: -

i. Sets the Internal File Separator (IFS) to the vertical bar ( | )
ii. Parses the curl_output variable, using the read command ( which is internal to Bash ), parsing the content of the variable into two sub-variables, RESPONSE and HTTP_CODE, using IFS to separate the two

The third and fourth lines of code merely output the content of the two sub-variables; I'm using echo -e merely to allow me to add the "\n" instruction to the end of the output, to make it look nice.

When I run the script, this is what I see: -

~/curlTest.sh 

Curl Output is   301 Moved Permanently

Moved Permanently

The document has moved here.
|301 

First Pass
==========

Response is  

HTTP Code is 

which demonstrates the problem that I was seeing with my real REST APIs.

Note that the output from the cURL command is returned, including the HTTP 301 response code: -

Curl Output is   301 Moved Permanently

Moved Permanently

The document has moved here.
|301 

but also note that, whilst the HTTP response is partially returned within the RESPONSE sub-variable, the HTTP return code is NOT returned via the HTTP_CODE variable.

There's a clue in this behaviour but it took me a while to find it.

I went down a rabbit hole for a while, replacing the use of IFS and the read command, with awk, because who doesn't love awk. This works BUT it does require one to know precisely how many columns are being returned in the output from the cURL command i.e. if I use  awk '{print $3}' I need to be sure that the thing I want will always be in column 3.

But, back to the root cause .... 

Notice that, in the above example, the value of the RESPONSE sub-variable is only PART of the actual curl_output variable, specifically the first piece of HTML: -


This gave me a clue; for some reason, the read command was treating that first element as the entirety of the response, and not seeing the vertical bar separator, meaning that IFS was, in essence, being ignored, meaning that the HTTP_CODE variable never gets populated.

So there's something between: -


and the next element of the curl_output variable: -


such as a space character ( 0x20 ).

I even resorted to using the hexdump and hexedit commands ( via brew install hexedit on macOS and apt-get install -y hexedit on Ubuntu ), to confirm what I was seeing: -

00000000   43 75 72 6C  20 4F 75 74  70 75 74 20  69 73 20 20  3C 21 44 4F  43 54 59 50  45 20 48 54  4D 4C 20 50  55 42 4C 49  43 20 22 2D  2F 2F 49 45  Curl Output is  
0000002C   54 46 2F 2F  44 54 44 20  48 54 4D 4C  20 32 2E 30  2F 2F 45 4E  22 3E 20 3C  68 74 6D 6C  3E 3C 68 65  61 64 3E 20  3C 74 69 74  6C 65 3E 33  TF//DTD HTML 2.0//EN"> 3</font></div> <div> <font face="Courier New, Courier, monospace">00000058   30 31 20 4D  6F 76 65 64  20 50 65 72  6D 61 6E 65  6E 74 6C 79  3C 2F 74 69  74 6C 65 3E  20 3C 2F 68  65 61 64 3E  3C 62 6F 64  79 3E 20 3C  01 Moved Permanently <
00000084   68 31 3E 4D  6F 76 65 64  20 50 65 72  6D 61 6E 65  6E 74 6C 79  3C 2F 68 31  3E 20 3C 70  3E 54 68 65  20 64 6F 63  75 6D 65 6E  74 20 68 61  h1>Moved Permanently
The document ha
000000B0   73 20 6D 6F  76 65 64 20  3C 61 20 68  72 65 66 3D  22 68 74 74  70 73 3A 2F  2F 77 77 77  2E 62 62 63  2E 63 6F 6D  2F 22 3E 68  65 72 65 3C  s moved here<
000000DC   2F 61 3E 2E  3C 2F 70 3E  20 3C 2F 62  6F 64 79 3E  3C 2F 68 74  6D 6C 3E 20  7C 33 30 31  20 0A 0A                                            /a>.
|301 ..

( For reference, the ASCII codes are documented here ).

Interestingly, there are multiple space characters ( 0x20 ) in the output, between DOCTYPE and HTML and PUBLIC etc. and yet ......

I'm not 100% sure why read would treat them differently but ..... c'est la vie.

This led me down a different path, reading up on the read command ( apologies for the punning ) led me to the -d parameter: -

-d delim continue until the first character of DELIM is read, rather than newline

This allowed me to tell read to effectively ignore space characters, rather than delimiting at the "first" space.

Thus I changed my script: -

IFS='|' read -d '' -r RESPONSE HTTP_CODE <<< "$curl_output"
echo -e "Response is" $RESPONSE "\n"
echo -e "HTTP Code is" $HTTP_CODE "\n"

adding in -d ''

Having created a script that does both types of read, one without -d and one with -d, this is what I see: -

Curl Output is   301 Moved Permanently

Moved Permanently

The document has moved here.
|301 

First Pass
==========

Response is  

HTTP Code is 

Second Pass
===========

Response is 301 Moved Permanently

Moved Permanently

The document has moved here.
 

HTTP Code is 301 

Note that, in the second pass, I see the entirety of the curl_output variable, specifically that before the vertical bar ( | ) separator in the RESPONSE sub-variable and, more importantly, I see the HTTP response code ( 301 ) in the HTTP_CODE sub-variable.

This is the entire script: -

#!/bin/bash

curl_output=$(curl -s -k http://www.bbc.com --write-out "|%{http_code}")

echo -e "Curl Output is " $curl_output "\n"

echo -e "First Pass"
echo -e "==========\n"

IFS='|' read -r RESPONSE HTTP_CODE <<< "$curl_output"
echo -e "Response is" $RESPONSE "\n"
echo -e "HTTP Code is" $HTTP_CODE "\n"

echo -e "Second Pass"
echo -e "===========\n"

IFS='|' read -d '' -r RESPONSE HTTP_CODE <<< "$curl_output"
echo -e "Response is" $RESPONSE "\n"
echo -e "HTTP Code is" $HTTP_CODE "\n"

Further more, whilst writing this post, with this little prototype script, I discovered something else that validated my original hypothesis, that the REST APIs had changed between the original script working and breaking ....

Recognising that the BBC homepage isn't really a good example of a REST API, even though cURL doesn't really care, I changed my script to use a "real" REST REST API ( I am using the Dummy REST API Example ).


When I run my script, as per the above, I get this: -

Curl Output is  {"status":"success","data":[{"id":"1","employee_name":"Tiger Nixon","employee_salary":"320800","employee_age":"61","profile_image":""},{"id":"2","employee_name":"Garrett Winters","employee_salary":"170750","employee_age":"63","profile_image":""},{"id":"3","employee_name":"Ashton Cox","employee_salary":"86000","employee_age":"66","profile_image":""},{"id":"4","employee_name":"Cedric Kelly","employee_salary":"433060","employee_age":"22","profile_image":""},{"id":"5","employee_name":"Airi Satou","employee_salary":"162700","employee_age":"33","profile_image":""},{"id":"6","employee_name":"Brielle Williamson","employee_salary":"372000","employee_age":"61","profile_image":""},{"id":"7","employee_name":"Herrod Chandler","employee_salary":"137500","employee_age":"59","profile_image":""},{"id":"8","employee_name":"Rhona Davidson","employee_salary":"327900","employee_age":"55","profile_image":""},{"id":"9","employee_name":"Colleen Hurst","employee_salary":"205500","employee_age":"39","profile_image":""},{"id":"10","employee_name":"Sonya Frost","employee_salary":"103600","employee_age":"23","profile_image":""},{"id":"11","employee_name":"Jena Gaines","employee_salary":"90560","employee_age":"30","profile_image":""},{"id":"12","employee_name":"Quinn Flynn","employee_salary":"342000","employee_age":"22","profile_image":""},{"id":"13","employee_name":"Charde Marshall","employee_salary":"470600","employee_age":"36","profile_image":""},{"id":"14","employee_name":"Haley Kennedy","employee_salary":"313500","employee_age":"43","profile_image":""},{"id":"15","employee_name":"Tatyana Fitzpatrick","employee_salary":"385750","employee_age":"19","profile_image":""},{"id":"16","employee_name":"Michael Silva","employee_salary":"198500","employee_age":"66","profile_image":""},{"id":"17","employee_name":"Paul Byrd","employee_salary":"725000","employee_age":"64","profile_image":""},{"id":"18","employee_name":"Gloria Little","employee_salary":"237500","employee_age":"59","profile_image":""},{"id":"19","employee_name":"Bradley Greer","employee_salary":"132000","employee_age":"41","profile_image":""},{"id":"20","employee_name":"Dai Rios","employee_salary":"217500","employee_age":"35","profile_image":""},{"id":"21","employee_name":"Jenette Caldwell","employee_salary":"345000","employee_age":"30","profile_image":""},{"id":"22","employee_name":"Yuri Berry","employee_salary":"675000","employee_age":"40","profile_image":""},{"id":"23","employee_name":"Caesar Vance","employee_salary":"106450","employee_age":"21","profile_image":""},{"id":"24","employee_name":"Doris Wilder","employee_salary":"85600","employee_age":"23","profile_image":""}]}|200 

First Pass
==========

Response is {"status":"success","data":[{"id":"1","employee_name":"Tiger Nixon","employee_salary":"320800","employee_age":"61","profile_image":""},{"id":"2","employee_name":"Garrett Winters","employee_salary":"170750","employee_age":"63","profile_image":""},{"id":"3","employee_name":"Ashton Cox","employee_salary":"86000","employee_age":"66","profile_image":""},{"id":"4","employee_name":"Cedric Kelly","employee_salary":"433060","employee_age":"22","profile_image":""},{"id":"5","employee_name":"Airi Satou","employee_salary":"162700","employee_age":"33","profile_image":""},{"id":"6","employee_name":"Brielle Williamson","employee_salary":"372000","employee_age":"61","profile_image":""},{"id":"7","employee_name":"Herrod Chandler","employee_salary":"137500","employee_age":"59","profile_image":""},{"id":"8","employee_name":"Rhona Davidson","employee_salary":"327900","employee_age":"55","profile_image":""},{"id":"9","employee_name":"Colleen Hurst","employee_salary":"205500","employee_age":"39","profile_image":""},{"id":"10","employee_name":"Sonya Frost","employee_salary":"103600","employee_age":"23","profile_image":""},{"id":"11","employee_name":"Jena Gaines","employee_salary":"90560","employee_age":"30","profile_image":""},{"id":"12","employee_name":"Quinn Flynn","employee_salary":"342000","employee_age":"22","profile_image":""},{"id":"13","employee_name":"Charde Marshall","employee_salary":"470600","employee_age":"36","profile_image":""},{"id":"14","employee_name":"Haley Kennedy","employee_salary":"313500","employee_age":"43","profile_image":""},{"id":"15","employee_name":"Tatyana Fitzpatrick","employee_salary":"385750","employee_age":"19","profile_image":""},{"id":"16","employee_name":"Michael Silva","employee_salary":"198500","employee_age":"66","profile_image":""},{"id":"17","employee_name":"Paul Byrd","employee_salary":"725000","employee_age":"64","profile_image":""},{"id":"18","employee_name":"Gloria Little","employee_salary":"237500","employee_age":"59","profile_image":""},{"id":"19","employee_name":"Bradley Greer","employee_salary":"132000","employee_age":"41","profile_image":""},{"id":"20","employee_name":"Dai Rios","employee_salary":"217500","employee_age":"35","profile_image":""},{"id":"21","employee_name":"Jenette Caldwell","employee_salary":"345000","employee_age":"30","profile_image":""},{"id":"22","employee_name":"Yuri Berry","employee_salary":"675000","employee_age":"40","profile_image":""},{"id":"23","employee_name":"Caesar Vance","employee_salary":"106450","employee_age":"21","profile_image":""},{"id":"24","employee_name":"Doris Wilder","employee_salary":"85600","employee_age":"23","profile_image":""}]} 

HTTP Code is 200 

Second Pass
===========

Response is {"status":"success","data":[{"id":"1","employee_name":"Tiger Nixon","employee_salary":"320800","employee_age":"61","profile_image":""},{"id":"2","employee_name":"Garrett Winters","employee_salary":"170750","employee_age":"63","profile_image":""},{"id":"3","employee_name":"Ashton Cox","employee_salary":"86000","employee_age":"66","profile_image":""},{"id":"4","employee_name":"Cedric Kelly","employee_salary":"433060","employee_age":"22","profile_image":""},{"id":"5","employee_name":"Airi Satou","employee_salary":"162700","employee_age":"33","profile_image":""},{"id":"6","employee_name":"Brielle Williamson","employee_salary":"372000","employee_age":"61","profile_image":""},{"id":"7","employee_name":"Herrod Chandler","employee_salary":"137500","employee_age":"59","profile_image":""},{"id":"8","employee_name":"Rhona Davidson","employee_salary":"327900","employee_age":"55","profile_image":""},{"id":"9","employee_name":"Colleen Hurst","employee_salary":"205500","employee_age":"39","profile_image":""},{"id":"10","employee_name":"Sonya Frost","employee_salary":"103600","employee_age":"23","profile_image":""},{"id":"11","employee_name":"Jena Gaines","employee_salary":"90560","employee_age":"30","profile_image":""},{"id":"12","employee_name":"Quinn Flynn","employee_salary":"342000","employee_age":"22","profile_image":""},{"id":"13","employee_name":"Charde Marshall","employee_salary":"470600","employee_age":"36","profile_image":""},{"id":"14","employee_name":"Haley Kennedy","employee_salary":"313500","employee_age":"43","profile_image":""},{"id":"15","employee_name":"Tatyana Fitzpatrick","employee_salary":"385750","employee_age":"19","profile_image":""},{"id":"16","employee_name":"Michael Silva","employee_salary":"198500","employee_age":"66","profile_image":""},{"id":"17","employee_name":"Paul Byrd","employee_salary":"725000","employee_age":"64","profile_image":""},{"id":"18","employee_name":"Gloria Little","employee_salary":"237500","employee_age":"59","profile_image":""},{"id":"19","employee_name":"Bradley Greer","employee_salary":"132000","employee_age":"41","profile_image":""},{"id":"20","employee_name":"Dai Rios","employee_salary":"217500","employee_age":"35","profile_image":""},{"id":"21","employee_name":"Jenette Caldwell","employee_salary":"345000","employee_age":"30","profile_image":""},{"id":"22","employee_name":"Yuri Berry","employee_salary":"675000","employee_age":"40","profile_image":""},{"id":"23","employee_name":"Caesar Vance","employee_salary":"106450","employee_age":"21","profile_image":""},{"id":"24","employee_name":"Doris Wilder","employee_salary":"85600","employee_age":"23","profile_image":""}]} 

HTTP Code is 200 

Yes, it works perfectly in both cases.

In other words, a real REST API that's returning JSON doesn't seem to break my script, regardless of whether I use read -d '' or not.

Which is nice.

However, I learned a lot whilst digging into this, both to fix the actual script AND to write this post.

And ....

EVERY DAY IS A SCHOOL DAY

No comments:

Visual Studio Code - Wow 🙀

Why did I not know that I can merely hit [cmd] [p]  to bring up a search box allowing me to search my project e.g. a repo cloned from GitHub...