How to urlencode data into a URL, with bash or curl

Question

How can a string be urlencoded and embedded into the URL? Please note that I am not trying to GET or POST data, so the -G and --data and --data-urlencode options of curl don't seem to do the job.

For example, if you used

curl -G http://example.com/foo --data-urlencode "bar=spaced data"

that would be functionally equivalent to

curl http://example.com/foo?bar=spaced%20data"

which is not desired.

I have a string foo/bar which must be urlencoded foo%2fbar and embedded into the URL.

curl http://example.com/api/projects/foo%2fbar/events

One hypothetical solution (if I could find something like this) would be to preprocess the data in bash, if there exists some kind of urlencode function.

DATA=foo/bar
ENCODED=`urlencode $DATA`
curl http://example.com/api/projects/${ENCODED}/events

Another hypothetical solution (if I could find something like this) would be some switch in curl, similar to this:

curl http://example.com/api/projects/{0}/events --string-urlencode "0=foo/bar"

The specific reason I'm looking for an answer to this question is the Gitlab API. For example, gitlab get single project NAMESPACE/PROJECT_NAME is URL-encoded, eg. /api/v3/projects/diaspora%2Fdiaspora (where / is represented by %2F). Further to this, you can request individual properties in the project, so you end up with a URL such as http://example.com/projects/diaspora%2Fdiaspora/events

Although this question is gitlab-specific, I imagine it's generally applicable to REST API's in general, and I'm surprised I can't find a pre-existing answer on stackoverflow or internet search.

urlencode $DATA is going to behave badly if DATA='*' (it would expand the glob, encoding a list of filenames); needs to be "$DATA". Also, see pubs.opengroup.org/onlinepubs/009695399/basedefs/… guidelines re: variable names (fourth paragraph): Shell and system tools use all-upper-case names for variables that impact their operation; names with at least one lower-case character are reserved for application use. Since environment variables and shell variables share a namespace, this applies to regular (non-exported) names as well. — Charles Duffy
– Charles Duffy, Commented May 18, 2016 at 20:37
BTW, did you try the answer by Orwellophile in the not-quite-dupe-but-closely-related question at stackoverflow.com/a/10660730/14122? — Charles Duffy
– Charles Duffy, Commented May 18, 2016 at 20:44
Seems pretty close to this question: stackoverflow.com/questions/29755942/… has among others also a perl solution that you can craft into a function. — user3277192
– user3277192, Commented May 19, 2016 at 0:46
So far yes, I'm able to find an implementation of urlencode() (found here) that can be pasted into a bash script. It's a solution, but I'm doubtful it's the best solution. — Edward Ned Harvey
– Edward Ned Harvey, Commented May 19, 2016 at 13:23
this answer is a one liner for it: stackoverflow.com/a/10797966/1839558 — lauksas
– lauksas, Commented Aug 13, 2020 at 15:35

Charles Duffy · Accepted Answer · 2016-05-18 20:40:22Z

15

The urlencode function you propose is easy enough to implement:

urlencode() {
  python -c 'import urllib, sys; print urllib.quote(sys.argv[1], sys.argv[2])' \
    "$1" "$urlencode_safe"
}

...used as:

data=foo/bar
encoded=$(urlencode "$data")
curl "http://example.com/api/projects/${encoded}/events"

If you want to have some characters which are passed through literally -- in many use cases, this is desired for /s -- instead use:

encoded=$(urlencode_safe='/' urlencode "$data")

edited May 18, 2016 at 20:40

answered May 18, 2016 at 20:35

Charles Duffy

299k43 gold badges441 silver badges497 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

rici Over a year ago

FWIW, if you happen to have jq installed, you can use the slightly shorter printf %s "$1" | jq -s -R -r @uri

dimo414 Over a year ago

Nice trick; a bit shorter (-s doesn't appear necessary, but I'm not sure): jq -R -r @uri <<<"$1"

jbustamovej Over a year ago

@rici great comment. This should be a separate answer!

Adam Mackler Over a year ago

Shorter still (if that's what you want) combine -R -r to -Rr

Troy Daniels Over a year ago

In python 3, you need to use urllib.parse.quote and import urllib.parse

Julio Batista Silva · Accepted Answer · 2024-03-21 14:54:33Z

1

One-liner in Python3:

python -c "from urllib.parse import quote; print(quote(input('Type here: ')))"

answered Mar 21, 2024 at 14:54

Julio Batista Silva

2,29924 silver badges21 bronze badges

Comments

chronospoon · Accepted Answer · 2024-05-17 20:56:50Z

1

One that supports multiple input lines, building on Julio's answers:

python3 -c "import sys, urllib.parse; print(urllib.parse.quote(sys.stdin.read()))"

Which lets me do this on macOS (copy something to the clipboard, then send it to test an endpoint):

alias urlencode='python3 -c "import sys, urllib.parse; print(urllib.parse.quote(sys.stdin.read()))"'

curl -X 'POST' -H 'accept: application/json' \
    "http://127.0.0.1:11434/generate?content=$(pbpaste | urlencode)"

answered May 17, 2024 at 20:56

chronospoon

6559 silver badges19 bronze badges

2 Comments

not2qubit Over a year ago

...and hit [CTRL-Z] after you are done entering.

chronospoon Over a year ago

Do you mean Ctrl-D (end of input)? Ctrl-Z generally suspends the process, which I think would prevent any output from appearing?

tlex · Accepted Answer · 2024-06-13 09:27:49Z

1

Expanding on @guilherme-z-santos's answer:

ue() { local in=$1; if [ -z "$in" ]; then read -r in; fi; echo $in|jq -Rr '@uri'; }

The jq param -s will add an unwanted %0A at the end, so it was dropped. Also, for it to be a proper function, it needs a couple of spaces and a ; at the end, before closing.

It can now be used:

$ ue ffoooä
ffooo%C3%A4
$ ue ffooo
ffooo
$ echo foooä|ue
fooo%C3%A4

answered Jun 13, 2024 at 9:27

tlex

313 bronze badges

Comments

Guilherme Z. Santos · Accepted Answer · 2024-03-06 19:40:49Z

0

Adding to @rici's comment on the accepted answer (which has more upvotes than the answer itself), we may create the function ue (short for urlencode, but you may call it as you wish) on top of jq and make it read from stdin or from an argument:

ue() {local in=$1; if [ -z "$in" ]; then read in; fi; echo $in|jq -sRr '@uri'}

echo "https://example.com/$(ue "some part")/?date=$(date|ue)"

One-liner, no Python, just jq, plain and simple...

answered Mar 6, 2024 at 19:40

Guilherme Z. Santos

2461 silver badge15 bronze badges

Comments

Adam D. · Accepted Answer · 2024-08-04 20:57:43Z

Since the question is how do you urlencode with bash or curl?

function urlencode() {
    sed "s/\x25/%25/g;s/\x20/%20/g;s/\x21/%21/g;s/\x22/%22/g;s/\x23/%23/g;s/\x5c\x24/%24/g;\
        s/\x26/%26/g;s/\x27/%27/g;s/\x28/%28/g;s/\x29/%29/g;s/\x2A/%2A/g;s/\x2B/%2B/g;\
        s/\x2C/%2C/g;s/\x2D/%2D/g;s/\x3A/%3A/g;s/\x3F/%3F/g;s/\x7C/%7C/g;s/\x5c\x5B/%5B/g"
}

You can add another to encode the / separately:

function encode_slash() { s/\x2F/%2F/g; }

Should work for most cases, but if you need to handle Unicode, then you need to convert those separately - you get the functionality of jq -jRr '@uri' but jq is written in C so of course it will be much quicker on large amounts of unicode. This is good for use on occasional unicode chars:

#!/bin/bash
 
## Written by Adam Danischewski 08/04/2024 

declare CURR_ORD 

str="${1:-😄.mp4}"

function ord() {
    printf -v CURR_ORD "%d" "\"$1"
}

function has_unicode() { 
 local input="$1"
 local -i charcnt=$(wc -m <<<"$input")
 local -i bytecnt=$(wc -c <<<"$input")
 ((charcnt!=bytecnt))
 return $?
}

function urlencode() {
    sed "s/\x25/%25/g;s/\x20/%20/g;s/\x21/%21/g;s/\x22/%22/g;s/\x23/%23/g;s/\x5c\x24/%24/g;\
        s/\x26/%26/g;s/\x27/%27/g;s/\x28/%28/g;s/\x29/%29/g;s/\x2A/%2A/g;s/\x2B/%2B/g;\
        s/\x2C/%2C/g;s/\x2D/%2D/g;s/\x3A/%3A/g;s/\x3F/%3F/g;s/\x7C/%7C/g;s/\x5c\x5B/%5B/g"
}

function encode_unicode() { 
for ((i=0;i<${#str};i++)); do
    char=${str:i:1}
    ord "$char"
    if ((${#CURR_ORD}>3)); then 
     od -t x1 <<< "$char" | awk '{$1="";gsub("^[[:space:]]*","");for(i=1;i<NF;i++) printf "%%" toupper($i);}'
    else 
     printf "%s" "$char" 
    fi 
done
}

## Tokenize percents before encoding unicode 
function tokenize_orig_pcts() {
  sed 's/%/\x01/g'
} 

## Tokenize percents after encoding unicode, since this is urlencoded..  
function tokenize_pcts() {
  sed 's/%/\x02/g'
} 

function detokenize_orig_pcts() {
  sed 's/\x01/%/g'
} 

function detokenize_pcts() {
  sed 's/\x02/%/g'
} 

function urlencode() { 
 sed "s/\x25/%25/g;s/\x20/%20/g;s/\x21/%21/g;s/\x22/%22/g;s/\x23/%23/g;s/\x5c\x24/%24/g;\
        s/\x26/%26/g;s/\x27/%27/g;s/\x28/%28/g;s/\x29/%29/g;s/\x2A/%2A/g;s/\x2B/%2B/g;\
        s/\x2C/%2C/g;s/\x3A/%3A/g;s/\x3F/%3F/g;s/\x7C/%7C/g;s/\x5c\x5B/%5B/g"
}

function main() { 
  if has_unicode "$str"; then 
    str=$(tokenize_orig_pcts <<< "$str")
    str=$(encode_unicode)
    str=$(tokenize_pcts <<< "$str")
    str=$(detokenize_orig_pcts <<< "$str")
    str=$(urlencode <<< "$str")
    detokenize_pcts <<< "$str"
  else 
    urlencode <<< "$str"
  fi  
}

main

RARE Kpop Manifesto · Accepted Answer · 2024-08-05 17:38:49Z

For larger inputs, a recursive awk function would be barely slower than python3's built-in urllib.parse.quote(), whilejq is by far the slowest :

(in these benchmarks, awk always went first in order to eliminate any possibility it benefits from system caching)

      in0: 39.0MiB 0:00:00 [ 418MiB/s] [ 418MiB/s] [==>] 100%            
     out9: 74.6MiB 0:00:01 [55.7MiB/s] [55.7MiB/s] [<=>]

( pvE 0.1 in0 < "$___" | mawkUx ; )  

1.27s user 0.09s system 99% cpu 1.365 total d083f07bbe4a3a55d14e2b6b2703c25d
 
      in0: 39.0MiB 0:00:00 [1.40GiB/s] [1.40GiB/s] [==>] 100%            
     out9: 74.6MiB 0:00:01 [63.4MiB/s] [63.4MiB/s] [<=>]

( pvE 0.1 in0 < "$___" | python3 -c ; )  

1.12s user 0.07s system 99% cpu 1.202 total d083f07bbe4a3a55d14e2b6b2703c25d
 
      in0: 39.0MiB 0:00:00 [ 317MiB/s] [ 317MiB/s] [==>] 100%            
     out9: 74.6MiB 0:00:02 [30.5MiB/s] [30.5MiB/s] [<=>]

( pvE 0.1 in0 < "$___" | jq -sRr '@uri'; ) 

2.39s user 0.07s system 99% cpu 2.472 total d083f07bbe4a3a55d14e2b6b2703c25d

Once in a blue moonawk is even faster than python's built-in :

      in0:  153MiB 0:00:00 [1.23GiB/s] [1.23GiB/s] [==>] 100%            
     out9:  199MiB 0:00:01 [ 102MiB/s] [ 102MiB/s] [ <=>]

( pvE 0.1 in0 < "$___" | mawkUx ; )  

1.68s user 0.26s system 98% cpu 1.979 total 827f416a5302a6fad2f844a86d9a4c56
 
      in0:  153MiB 0:00:00 [2.38GiB/s] [2.38GiB/s] [==>] 100%            
     out9:  199MiB 0:00:03 [56.6MiB/s] [56.6MiB/s] [ <=> ]

( pvE 0.1 in0 < "$___" | python3 -c ; )  

3.27s user 0.26s system 99% cpu 3.560 total 827f416a5302a6fad2f844a86d9a4c56

      in0:  153MiB 0:00:01 [86.0MiB/s] [86.0MiB/s] [==>] 100%            
     out9:  199MiB 0:00:06 [32.3MiB/s] [32.3MiB/s] [<=> ]

( pvE 0.1 in0 < "$___" | jq -sRr '@uri'; ) 

6.03s user 0.18s system 99% cpu 6.226 total 827f416a5302a6fad2f844a86d9a4c56

calling this function with no arguments at all defaults to url-encoding all of $0

function urlencode_rec(__, _, ___, ____) {

    if (_)
        if (!___ ? _^(_^_^_ - _) < (____ = length(__)) \
                                 : (____ = __ - ___) < _)
            return ___ \
                ? urlencode_rec(__, _,
                   __ += (____ - ____%_) / _) urlencode_rec(++__, _, ___) \
                : urlencode_rec(substr(__, !!_, _ = (____ - ____%_) / _),
                  (__ = substr(__, ++_))^(_ = "") + !_) urlencode_rec((__)_,
                                             (___ = ! (__ = _)) + ___)
        else
            return substr(_, (__ = urlencode((_ = !_ < _) ? __ : \
                   substr($(_++), _ + (_ *= (++_*_*_)^_^_) * --__,
                      (___ - __) * _)))^!_, -gsub(/\+/, "%20", __))__

    else if ((____ = (_ = substr(_, _, _)) + !__) &&
                     (__ == _) * (__ == (_ < _)) < ____)
        return __

    else if ((____ = ____ ? -length() : length(__)) <= (\
                   ___ = (_ += ++_) * (_*_*_)^_^_) && -___ <= ____)

        return substr(_ = !_, _, ____ && ((__ = urlencode(_ < ____ \
                        ? __ : $_))^_ < -gsub(/\+/, "%20", __)))__

    else if (____ < !_ &&
                    __ = ((__ = -____) - (__ %= ___)) / ___ + !!__)
        return \
        (___ = __ <= _) ? urlencode_rec(___, -_, __) \
                        : urlencode_rec(!___, -_, ___ = (__ - __%_) / _) \
                          urlencode_rec(++___, -_, __)
    else
        return urlencode_rec(substr(__, ___ = !!_,
                 ____ = (____ - ____%_) / _), ___ += (__ = substr(__,
               ++____))^(_ = "")) urlencode_rec((__)_, (__ = _) + ___)
}

Collectives™ on Stack Overflow

How to urlencode data into a URL, with bash or curl

7 Answers 7

5 Comments

Comments

2 Comments

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

5 Comments

Comments

2 Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related