15

How can a string be urlencoded and embedded into the URL? Please note that I am not trying to GET or POST data, so the -G and --data and --data-urlencode options of curl don't seem to do the job.

For example, if you used

curl -G http://example.com/foo --data-urlencode "bar=spaced data"

that would be functionally equivalent to

curl http://example.com/foo?bar=spaced%20data"

which is not desired.

I have a string foo/bar which must be urlencoded foo%2fbar and embedded into the URL.

curl http://example.com/api/projects/foo%2fbar/events

One hypothetical solution (if I could find something like this) would be to preprocess the data in bash, if there exists some kind of urlencode function.

DATA=foo/bar
ENCODED=`urlencode $DATA`
curl http://example.com/api/projects/${ENCODED}/events

Another hypothetical solution (if I could find something like this) would be some switch in curl, similar to this:

curl http://example.com/api/projects/{0}/events --string-urlencode "0=foo/bar"

The specific reason I'm looking for an answer to this question is the Gitlab API. For example, gitlab get single project NAMESPACE/PROJECT_NAME is URL-encoded, eg. /api/v3/projects/diaspora%2Fdiaspora (where / is represented by %2F). Further to this, you can request individual properties in the project, so you end up with a URL such as http://example.com/projects/diaspora%2Fdiaspora/events

Although this question is gitlab-specific, I imagine it's generally applicable to REST API's in general, and I'm surprised I can't find a pre-existing answer on stackoverflow or internet search.

5
  • 1
    urlencode $DATA is going to behave badly if DATA='*' (it would expand the glob, encoding a list of filenames); needs to be "$DATA". Also, see pubs.opengroup.org/onlinepubs/009695399/basedefs/… guidelines re: variable names (fourth paragraph): Shell and system tools use all-upper-case names for variables that impact their operation; names with at least one lower-case character are reserved for application use. Since environment variables and shell variables share a namespace, this applies to regular (non-exported) names as well. Commented May 18, 2016 at 20:37
  • BTW, did you try the answer by Orwellophile in the not-quite-dupe-but-closely-related question at stackoverflow.com/a/10660730/14122? Commented May 18, 2016 at 20:44
  • Seems pretty close to this question: stackoverflow.com/questions/29755942/… has among others also a perl solution that you can craft into a function. Commented May 19, 2016 at 0:46
  • So far yes, I'm able to find an implementation of urlencode() (found here) that can be pasted into a bash script. It's a solution, but I'm doubtful it's the best solution. Commented May 19, 2016 at 13:23
  • this answer is a one liner for it: stackoverflow.com/a/10797966/1839558 Commented Aug 13, 2020 at 15:35

7 Answers 7

15

The urlencode function you propose is easy enough to implement:

urlencode() {
  python -c 'import urllib, sys; print urllib.quote(sys.argv[1], sys.argv[2])' \
    "$1" "$urlencode_safe"
}

...used as:

data=foo/bar
encoded=$(urlencode "$data")
curl "http://example.com/api/projects/${encoded}/events"

If you want to have some characters which are passed through literally -- in many use cases, this is desired for /s -- instead use:

encoded=$(urlencode_safe='/' urlencode "$data")
Sign up to request clarification or add additional context in comments.

5 Comments

FWIW, if you happen to have jq installed, you can use the slightly shorter printf %s "$1" | jq -s -R -r @uri
Nice trick; a bit shorter (-s doesn't appear necessary, but I'm not sure): jq -R -r @uri <<<"$1"
@rici great comment. This should be a separate answer!
Shorter still (if that's what you want) combine -R -r to -Rr
In python 3, you need to use urllib.parse.quote and import urllib.parse
1

One-liner in Python3:

python -c "from urllib.parse import quote; print(quote(input('Type here: ')))"

Comments

1

One that supports multiple input lines, building on Julio's answers:

python3 -c "import sys, urllib.parse; print(urllib.parse.quote(sys.stdin.read()))"

Which lets me do this on macOS (copy something to the clipboard, then send it to test an endpoint):

alias urlencode='python3 -c "import sys, urllib.parse; print(urllib.parse.quote(sys.stdin.read()))"'

curl -X 'POST' -H 'accept: application/json' \
    "http://127.0.0.1:11434/generate?content=$(pbpaste | urlencode)"

2 Comments

...and hit [CTRL-Z] after you are done entering.
Do you mean Ctrl-D (end of input)? Ctrl-Z generally suspends the process, which I think would prevent any output from appearing?
1

Expanding on @guilherme-z-santos's answer:

ue() { local in=$1; if [ -z "$in" ]; then read -r in; fi; echo $in|jq -Rr '@uri'; }

The jq param -s will add an unwanted %0A at the end, so it was dropped. Also, for it to be a proper function, it needs a couple of spaces and a ; at the end, before closing.

It can now be used:

$ ue ffoooä
ffooo%C3%A4
$ ue ffooo
ffooo
$ echo foooä|ue
fooo%C3%A4

Comments

0

Adding to @rici's comment on the accepted answer (which has more upvotes than the answer itself), we may create the function ue (short for urlencode, but you may call it as you wish) on top of jq and make it read from stdin or from an argument:

ue() {local in=$1; if [ -z "$in" ]; then read in; fi; echo $in|jq -sRr '@uri'}
echo "https://example.com/$(ue "some part")/?date=$(date|ue)"

One-liner, no Python, just jq, plain and simple...

Comments

0

Since the question is how do you urlencode with bash or curl?

function urlencode() {
    sed "s/\x25/%25/g;s/\x20/%20/g;s/\x21/%21/g;s/\x22/%22/g;s/\x23/%23/g;s/\x5c\x24/%24/g;\
        s/\x26/%26/g;s/\x27/%27/g;s/\x28/%28/g;s/\x29/%29/g;s/\x2A/%2A/g;s/\x2B/%2B/g;\
        s/\x2C/%2C/g;s/\x2D/%2D/g;s/\x3A/%3A/g;s/\x3F/%3F/g;s/\x7C/%7C/g;s/\x5c\x5B/%5B/g"
}

You can add another to encode the / separately:

function encode_slash() { s/\x2F/%2F/g; }

Should work for most cases, but if you need to handle Unicode, then you need to convert those separately - you get the functionality of jq -jRr '@uri' but jq is written in C so of course it will be much quicker on large amounts of unicode. This is good for use on occasional unicode chars:

#!/bin/bash
 
## Written by Adam Danischewski 08/04/2024 

declare CURR_ORD 

str="${1:-😄.mp4}"

function ord() {
    printf -v CURR_ORD "%d" "\"$1"
}

function has_unicode() { 
 local input="$1"
 local -i charcnt=$(wc -m <<<"$input")
 local -i bytecnt=$(wc -c <<<"$input")
 ((charcnt!=bytecnt))
 return $?
}

function urlencode() {
    sed "s/\x25/%25/g;s/\x20/%20/g;s/\x21/%21/g;s/\x22/%22/g;s/\x23/%23/g;s/\x5c\x24/%24/g;\
        s/\x26/%26/g;s/\x27/%27/g;s/\x28/%28/g;s/\x29/%29/g;s/\x2A/%2A/g;s/\x2B/%2B/g;\
        s/\x2C/%2C/g;s/\x2D/%2D/g;s/\x3A/%3A/g;s/\x3F/%3F/g;s/\x7C/%7C/g;s/\x5c\x5B/%5B/g"
}

function encode_unicode() { 
for ((i=0;i<${#str};i++)); do
    char=${str:i:1}
    ord "$char"
    if ((${#CURR_ORD}>3)); then 
     od -t x1 <<< "$char" | awk '{$1="";gsub("^[[:space:]]*","");for(i=1;i<NF;i++) printf "%%" toupper($i);}'
    else 
     printf "%s" "$char" 
    fi 
done
}

## Tokenize percents before encoding unicode 
function tokenize_orig_pcts() {
  sed 's/%/\x01/g'
} 

## Tokenize percents after encoding unicode, since this is urlencoded..  
function tokenize_pcts() {
  sed 's/%/\x02/g'
} 

function detokenize_orig_pcts() {
  sed 's/\x01/%/g'
} 

function detokenize_pcts() {
  sed 's/\x02/%/g'
} 

function urlencode() { 
 sed "s/\x25/%25/g;s/\x20/%20/g;s/\x21/%21/g;s/\x22/%22/g;s/\x23/%23/g;s/\x5c\x24/%24/g;\
        s/\x26/%26/g;s/\x27/%27/g;s/\x28/%28/g;s/\x29/%29/g;s/\x2A/%2A/g;s/\x2B/%2B/g;\
        s/\x2C/%2C/g;s/\x3A/%3A/g;s/\x3F/%3F/g;s/\x7C/%7C/g;s/\x5c\x5B/%5B/g"
}

function main() { 
  if has_unicode "$str"; then 
    str=$(tokenize_orig_pcts <<< "$str")
    str=$(encode_unicode)
    str=$(tokenize_pcts <<< "$str")
    str=$(detokenize_orig_pcts <<< "$str")
    str=$(urlencode <<< "$str")
    detokenize_pcts <<< "$str"
  else 
    urlencode <<< "$str"
  fi  
}

main

Comments

0

For larger inputs, a recursive awk function would be barely slower than python3's built-in urllib.parse.quote(), whilejq is by far the slowest :

(in these benchmarks, awk always went first in order to eliminate any possibility it benefits from system caching)

      in0: 39.0MiB 0:00:00 [ 418MiB/s] [ 418MiB/s] [==>] 100%            
     out9: 74.6MiB 0:00:01 [55.7MiB/s] [55.7MiB/s] [<=>]

( pvE 0.1 in0 < "$___" | mawkUx ; )  

1.27s user 0.09s system 99% cpu 1.365 total d083f07bbe4a3a55d14e2b6b2703c25d
 
      in0: 39.0MiB 0:00:00 [1.40GiB/s] [1.40GiB/s] [==>] 100%            
     out9: 74.6MiB 0:00:01 [63.4MiB/s] [63.4MiB/s] [<=>]

( pvE 0.1 in0 < "$___" | python3 -c ; )  

1.12s user 0.07s system 99% cpu 1.202 total d083f07bbe4a3a55d14e2b6b2703c25d
 
      in0: 39.0MiB 0:00:00 [ 317MiB/s] [ 317MiB/s] [==>] 100%            
     out9: 74.6MiB 0:00:02 [30.5MiB/s] [30.5MiB/s] [<=>]

( pvE 0.1 in0 < "$___" | jq -sRr '@uri'; ) 

2.39s user 0.07s system 99% cpu 2.472 total d083f07bbe4a3a55d14e2b6b2703c25d
 

Once in a blue moonawk is even faster than python's built-in :

      in0:  153MiB 0:00:00 [1.23GiB/s] [1.23GiB/s] [==>] 100%            
     out9:  199MiB 0:00:01 [ 102MiB/s] [ 102MiB/s] [ <=>]

( pvE 0.1 in0 < "$___" | mawkUx ; )  

1.68s user 0.26s system 98% cpu 1.979 total 827f416a5302a6fad2f844a86d9a4c56
 
      in0:  153MiB 0:00:00 [2.38GiB/s] [2.38GiB/s] [==>] 100%            
     out9:  199MiB 0:00:03 [56.6MiB/s] [56.6MiB/s] [ <=> ]

( pvE 0.1 in0 < "$___" | python3 -c ; )  

3.27s user 0.26s system 99% cpu 3.560 total 827f416a5302a6fad2f844a86d9a4c56

      in0:  153MiB 0:00:01 [86.0MiB/s] [86.0MiB/s] [==>] 100%            
     out9:  199MiB 0:00:06 [32.3MiB/s] [32.3MiB/s] [<=> ]

( pvE 0.1 in0 < "$___" | jq -sRr '@uri'; ) 

6.03s user 0.18s system 99% cpu 6.226 total 827f416a5302a6fad2f844a86d9a4c56
 

calling this function with no arguments at all defaults to url-encoding all of $0

function urlencode_rec(__, _, ___, ____) {

    if (_)
        if (!___ ? _^(_^_^_ - _) < (____ = length(__)) \
                                 : (____ = __ - ___) < _)
            return ___ \
                ? urlencode_rec(__, _,
                   __ += (____ - ____%_) / _) urlencode_rec(++__, _, ___) \
                : urlencode_rec(substr(__, !!_, _ = (____ - ____%_) / _),
                  (__ = substr(__, ++_))^(_ = "") + !_) urlencode_rec((__)_,
                                             (___ = ! (__ = _)) + ___)
        else
            return substr(_, (__ = urlencode((_ = !_ < _) ? __ : \
                   substr($(_++), _ + (_ *= (++_*_*_)^_^_) * --__,
                      (___ - __) * _)))^!_, -gsub(/\+/, "%20", __))__

    else if ((____ = (_ = substr(_, _, _)) + !__) &&
                     (__ == _) * (__ == (_ < _)) < ____)
        return __

    else if ((____ = ____ ? -length() : length(__)) <= (\
                   ___ = (_ += ++_) * (_*_*_)^_^_) && -___ <= ____)

        return substr(_ = !_, _, ____ && ((__ = urlencode(_ < ____ \
                        ? __ : $_))^_ < -gsub(/\+/, "%20", __)))__

    else if (____ < !_ &&
                    __ = ((__ = -____) - (__ %= ___)) / ___ + !!__)
        return \
        (___ = __ <= _) ? urlencode_rec(___, -_, __) \
                        : urlencode_rec(!___, -_, ___ = (__ - __%_) / _) \
                          urlencode_rec(++___, -_, __)
    else
        return urlencode_rec(substr(__, ___ = !!_,
                 ____ = (____ - ____%_) / _), ___ += (__ = substr(__,
               ++____))^(_ = "")) urlencode_rec((__)_, (__ = _) + ___)
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.