URL UTF8 Encoder : UTF8 Byte Hex « Development Class « Java

Home
Java
1.2D Graphics GUI
2.3D
3.Advanced Graphics
4.Ant
5.Apache Common
6.Chart
7.Class
8.Collections Data Structure
9.Data Type
10.Database SQL JDBC
11.Design Pattern
12.Development Class
13.EJB3
14.Email
15.Event
16.File Input Output
17.Game
18.Generics
19.GWT
20.Hibernate
21.I18N
22.J2EE
23.J2ME
24.JDK 6
25.JNDI LDAP
26.JPA
27.JSP
28.JSTL
29.Language Basics
30.Network Protocol
31.PDF RTF
32.Reflection
33.Regular Expressions
34.Scripting
35.Security
36.Servlets
37.Spring
38.Swing Components
39.Swing JFC
40.SWT JFace Eclipse
41.Threads
42.Tiny Application
43.Velocity
44.Web Services SOA
45.XML
Java Tutorial
Java Book
Java Source Code / Java Documentation
Java Open Source
Jar File Download
Java Articles
Java Products
Java by API
SCJP
Java » Development Class » UTF8 Byte HexScreenshots 
URL UTF8 Encoder
 
/**
 * Provides a method to encode any string into a URL-safe
 * form.
 * Non-ASCII characters are first encoded as sequences of
 * two or three bytes, using the UTF-8 algorithm, before being
 * encoded as %HH escapes.
 *
 * Created: 17 April 1997
 * Author: Bert Bos <[email protected]>
 *
 * URLUTF8Encoder: http://www.w3.org/International/URLUTF8Encoder.java
 *
 * Copyright © 1997 World Wide Web Consortium, (Massachusetts
 * Institute of Technology, European Research Consortium for
 * Informatics and Mathematics, Keio University). All Rights Reserved.
 * This work is distributed under the W3C® Software License [1] in the
 * hope that it will be useful, but WITHOUT ANY WARRANTY; without even
 * the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
 * PURPOSE.
 *
 * [1] http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231
 */
//package com.substanceofcode.utils;

public class URLUTF8Encoder
{

  final static String[] hex = {
    "%00""%01""%02""%03""%04""%05""%06""%07",
    "%08""%09""%0a""%0b""%0c""%0d""%0e""%0f",
    "%10""%11""%12""%13""%14""%15""%16""%17",
    "%18""%19""%1a""%1b""%1c""%1d""%1e""%1f",
    "%20""%21""%22""%23""%24""%25""%26""%27",
    "%28""%29""%2a""%2b""%2c""%2d""%2e""%2f",
    "%30""%31""%32""%33""%34""%35""%36""%37",
    "%38""%39""%3a""%3b""%3c""%3d""%3e""%3f",
    "%40""%41""%42""%43""%44""%45""%46""%47",
    "%48""%49""%4a""%4b""%4c""%4d""%4e""%4f",
    "%50""%51""%52""%53""%54""%55""%56""%57",
    "%58""%59""%5a""%5b""%5c""%5d""%5e""%5f",
    "%60""%61""%62""%63""%64""%65""%66""%67",
    "%68""%69""%6a""%6b""%6c""%6d""%6e""%6f",
    "%70""%71""%72""%73""%74""%75""%76""%77",
    "%78""%79""%7a""%7b""%7c""%7d""%7e""%7f",
    "%80""%81""%82""%83""%84""%85""%86""%87",
    "%88""%89""%8a""%8b""%8c""%8d""%8e""%8f",
    "%90""%91""%92""%93""%94""%95""%96""%97",
    "%98""%99""%9a""%9b""%9c""%9d""%9e""%9f",
    "%a0""%a1""%a2""%a3""%a4""%a5""%a6""%a7",
    "%a8""%a9""%aa""%ab""%ac""%ad""%ae""%af",
    "%b0""%b1""%b2""%b3""%b4""%b5""%b6""%b7",
    "%b8""%b9""%ba""%bb""%bc""%bd""%be""%bf",
    "%c0""%c1""%c2""%c3""%c4""%c5""%c6""%c7",
    "%c8""%c9""%ca""%cb""%cc""%cd""%ce""%cf",
    "%d0""%d1""%d2""%d3""%d4""%d5""%d6""%d7",
    "%d8""%d9""%da""%db""%dc""%dd""%de""%df",
    "%e0""%e1""%e2""%e3""%e4""%e5""%e6""%e7",
    "%e8""%e9""%ea""%eb""%ec""%ed""%ee""%ef",
    "%f0""%f1""%f2""%f3""%f4""%f5""%f6""%f7",
    "%f8""%f9""%fa""%fb""%fc""%fd""%fe""%ff"
  };

  /**
   * Encode a string to the "x-www-form-urlencoded" form, enhanced
   * with the UTF-8-in-URL proposal. This is what happens:
   *
   * <ul>
   * <li><p>The ASCII characters 'a' through 'z', 'A' through 'Z',
   *        and '0' through '9' remain the same.
   *
   * <li><p>The unreserved characters - _ . ! ~ * ' ( ) remain the same.
   *
   * <li><p>The space character ' ' is converted into a plus sign '+'.
   *
   * <li><p>All other ASCII characters are converted into the
   *        3-character string "%xy", where xy is
   *        the two-digit hexadecimal representation of the character
   *        code
   *
   * <li><p>All non-ASCII characters are encoded in two steps: first
   *        to a sequence of 2 or 3 bytes, using the UTF-8 algorithm;
   *        secondly each of these bytes is encoded as "%xx".
   * </ul>
   *
   @param s The string to be encoded
   @return The encoded string
   */
  public static String encode(String s)
  {
    StringBuffer sbuf = new StringBuffer();
    int len = s.length();
    for (int i = 0; i < len; i++) {
      int ch = s.charAt(i);
      if ('A' <= ch && ch <= 'Z') {    // 'A'..'Z'
        sbuf.append((char)ch);
      else if ('a' <= ch && ch <= 'z') {  // 'a'..'z'
         sbuf.append((char)ch);
      else if ('0' <= ch && ch <= '9') {  // '0'..'9'
         sbuf.append((char)ch);
      else if (ch == ' ') {      // space
         sbuf.append('+');
      else if (ch == '-' || ch == '_'    // unreserved
          || ch == '.' || ch == '!'
          || ch == '~' || ch == '*'
          || ch == '\'' || ch == '('
          || ch == ')') {
        sbuf.append((char)ch);
      else if (ch <= 0x007f) {    // other ASCII
         sbuf.append(hex[ch]);
      else if (ch <= 0x07FF) {    // non-ASCII <= 0x7FF
         sbuf.append(hex[0xc0 (ch >> 6)]);
         sbuf.append(hex[0x80 (ch & 0x3F)]);
      else {          // 0x7FF < ch <= 0xFFFF
         sbuf.append(hex[0xe0 (ch >> 12)]);
         sbuf.append(hex[0x80 ((ch >> 60x3F)]);
         sbuf.append(hex[0x80 (ch & 0x3F)]);
      }
    }
    return sbuf.toString();
  }

}

   
  
Related examples in the same category
1.Convert file in SJIS to UTF8
2.Return an UTF-8 encoded String
3.Return an UTF-8 encoded String by length
4.UTF8 String utilities
5.Return UTF-8 encoded byte[] representation of a String
6.Encodes octects (using utf-8) into Hex data
7.Decodes values of attributes in the DN encoded in hex into a UTF-8 String.
8.converting between byte arrays and hex encoded strings
9.Convert bytes To Hex
10.Convert hex To Bytes
11.Unicode 2 ASCII
12.Make bytes
13.String converterString converter
14.Show unicode stringShow unicode string
15.Normalizer
16.Convert from UTF-8 to Unicode
17.Convert from Unicode to UTF-8
18.Utility methods for handling UTF-8 encoded byte streams.
19.Read Windows Notepad Unicode files
20.UTF Util
21.To UTF8 InputStream
22.Returns {@code true} if the specified character sequence is a valid sequence of UTF-16 char values.
w___w___w.j___a_v_a2___s___._c_om__ | Contact Us
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.