Thursday, December 27, 2018

"this app can break"

“This app can break”

• When we type this statement on notepad ,save it and if we open the same file we see the following code as output

[][][][][][][][][]

 This is due to the following explanation
----------------------------------------------------

• Phrases like “this app can break” are generally termed as “hoaxes”
• When we type such sentences the Notepad tries to guess the file encoding and instead it makes a wrong guess.
• While we are saving the text in notepad it is saving in the ANSI format and when we open the same text it is opened in Unicode format or encoding format by default.
• Notepad makes use of a built-in window class named "EDIT". Up to Windows 95 was the only available font for Notepad. Windows NT 4.0 and 98 introduced the ability to change this font. In Windows 2000 and XP the default font was changed to Lucida Console.
• Notepad can edit traditional 8-bit text files as well as Unicode text files such as UTF-8 and UTF-16.
• Notepad accepts text from the Windows ‘clipboard’. When clipboard data with multiple formats is pasted into Notepad, the program will only accept text in the CF_TEXT format.
• The Windows NT version of Notepad, installed by default on Win2000 & XP, has the ability to detect Unicode files even when they are missing a byte order mark.
• It utilizes a Windows API function called “IsTextUnicode ()”—and if we pass some data to it, and it tells you whether it's UTF-16-encoded or not.
• This Function is imperfect, incorrectly identifying some all-lowercase ASCII text as UTF-16.

• As a result, Notepad interprets a file containing a phrase like "aaaa aaa aaa aaaaa" as two-byte Unicode text file and attempts to display as it is.
• Text files containing Unicode like UTF-16-encoded Unicode are supposed to start with a "Byte-Order Mark" (BOM), which is a two-byte flag, that tells a reader how the following UTF-16 data is encoded. Since these two bytes are exceedingly unlikely to occur at the beginning of an ASCII text file, it's commonly used to tell whether a text file is encoded in UTF-16.
• WinCustomize.com discovered an odd bug in Notepad that's triggered by a text file consisting of a four-letter word, two three-letter words, and a five letter word. Some of the examples of such sentences are
a) bush hid the facts
b) 1111 111 111 11111
c) this txt are longs

• Since the notepad do not have the functionality to recognize the format that it is in.
• If we supply to the notepad the type of the font that it has to use i.e..in “Arial Unicode MS” then it retains the original format text.

Encoding,Decoding,Differences between unicode and ASCII

Encoding:

1) Encoding is the process of transforming information from one format into another

2) Encoding is the process of putting a sequence of characters (letters, numbers, punctuation, and certain symbols) into a specialized format for efficient transmission or storage.

3) Encoding and decoding are used in data communications, networking, and storage

4) In encoding the data is converted into machine-readable format.

5) It is used to convert plain text into a different form by means of a code.

6) Data may also be encoded for security reasons

Decoding:

1) Decoding is the reverse of encoding, which is the process of transforming information from one format into another.

2) It also the process of restoring original messages from the forms in which they were transmitted, stored or enciphered by applying a suitable code.

3) The encoded code is Converted into ordinary (original) language

4) A secret key or password is required for decryption of the content that is encoded.

Differences between ASCII and Unicode:

1) ASCII is a 7-bit encoding technique which assigns a number to each of the 128 characters used &

Unicode is a coding system for electronic text that includes every written alphabet in existence

2) The range of ASCII is 128 characters (95 visible and 33 control) and supports only one script and

The Unicode covers 1, 07,000 characters and 90 scripts when it comes to range.

3) ASCII is a 7-bit character set, designed to run in a computer environment of at least 8 bits &

The Unicode character set is a 27-bit character encoding intended to eventually include every character in common use in every known language

4) It defines 128 characters numbered from 0 to 127, or in hexadecimal from 00 to 7F &

Unicode includes the ASCII set as its first 128 characters and many characters.

5) ASCII supports only the American English and

Unicode supports many of the world’s languages.

6) Most software and communication programs understand ASCII and

Not all software and communication programs understand Unicode.









Base64 encoding and URL encoding

What is Base64 encoding? Where is it used?

1) Base64 is a generic term for any number of similar encoding schemes that encodes binary data by treating it numerically and translating it into a base 64 representation.
2) The Base64 term originates from a specific MIME content transfer encoding.
3) Base64 encoding schemes are commonly used when there is a need to encode binary data that needs be stored and transferred over media that are designed to deal with textual data.
4) This is to ensure that the data remains intact without modification during transport.
5) Base64 is used commonly in a number of applications including email via MIME, and storing complex data in xml.
What is URL encoding? Where is it used?

1) URL Encoding is the process of converting string into valid URL format.
2) Valid URL format means that the URL contains only what is termed "alpha | digit | safe | extra | escape" characters.
3) URL encoding is normally performed to convert data passed via html forms, because such data may contain special character, such as "/", ".", "#", and so on, which could either:
a) Have special meanings; or
b) Is not a valid character for an URL; or
c) Could be altered during transfer.

4) One of the most common encounters with URL Encoding is when dealing with s.
5) URL encoding converts characters into a format that can be safely transmitted over the Internet.
6) URL encoding replaces unsafe ASCII characters with "%" followed by two hexadecimal digits corresponding to the character values in the ISO-8859-1 character-set.