Home » SQL & PL/SQL » SQL & PL/SQL » What is the difference between varchar2 and nvarchar2? (Oracle 18 Windows Server 2016)
What is the difference between varchar2 and nvarchar2? [message #689503] Wed, 24 January 2024 15:40 Go to next message
Darth Waiter
Messages: 78
Registered: October 2020
Member
Under Oracle, when I enter multibyte data into varchar2 or nvarchar2 it is stored correctly, whereas under MS SQL it would have been converted into question marks in varchar.
Having said that, what would be the reason for using nvarchar2?
drop table tbl;
create table tbl (clm nvarchar2(50)); -- nvarchar2 or varchar2 here does not change the output of the select below
insert into tbl
select 'He wes Leovenaðes sone -- liðe him be Drihten.' from dual
union select 'Sîne klâwen durh die wolken sint geslagen,' from dual
union select 'Τη γλώσσα μου έδωσαν ελληνική' from dual
union select 'गीता सुगीता कर्तव्या किमन्यैः शास्त्रविस्तरैः।' from dual
union select '齊天大聖孫悟空' from dual;
commit;
select a, vsize(a) as aa, b, vsize(b) as bb from(select clm as a, cast(clm as varchar(50 char)) as b from tbl)
Output:
a                                               aa      b                                               bb
----------------------------------------------------------------------------------------------------------
He wes Leovenaðes sone -- liðe him be Drihten.	92	He wes Leovenaðes sone -- liðe him be Drihten.	48
Sîne klâwen durh die wolken sint geslagen,	84	Sîne klâwen durh die wolken sint geslagen,	44
Τη γλώσσα μου έδωσαν ελληνική			58	Τη γλώσσα μου έδωσαν ελληνική			54
गीता सुगीता कर्तव्या किमन्यैः शास्त्रविस्तरैः।			92	गीता सुगीता कर्तव्या किमन्यैः शास्त्रविस्तरैः।			130
齊天大聖孫悟空					14	齊天大聖孫悟空					21
Other than bloated storage, there is no difference!

[Updated on: Wed, 24 January 2024 15:43]

Report message to a moderator

Re: What is the difference between varchar2 and nvarchar2? [message #689504 is a reply to message #689503] Wed, 24 January 2024 16:53 Go to previous messageGo to next message
Solomon Yakobson
Messages: 3273
Registered: January 2010
Location: Connecticut, USA
Senior Member
SQL> select parameter,value from nls_database_parameters where parameter like '%CHARACTERSET';

PARAMETER                 VALUE
------------------------- ---------------
NLS_NCHAR_CHARACTERSET    AL16UTF16
NLS_CHARACTERSET          AL32UTF8

SQL>
As you can see, VARCHAR2 and NVARCHAR2 can be stored in different character sets, so same text in VARCHAR2 and in NVARCHAR2 can have different byte size / vsize.

SY.

[Updated on: Wed, 24 January 2024 16:54]

Report message to a moderator

Re: What is the difference between varchar2 and nvarchar2? [message #689505 is a reply to message #689503] Thu, 25 January 2024 04:35 Go to previous messageGo to next message
John Watson
Messages: 8931
Registered: January 2010
Location: Global Village
Senior Member
My opinion (which is certainly open to debate) is that NVARCHAR2 and NCLOB data types are an historical anomaly that dates back to before Oracle supported Unicode. If I remember correctly, we've had Unicode since 9i. From the 23c docs:Quote:
Oracle recommends using SQL CHAR, VARCHAR2, and CLOB data types in AL32UTF8 database to store Unicode character data. Use of SQL NCHAR, NVARCHAR2, and NCLOB should be considered only if you must use a database whose database character set is not AL32UTF8.
Re: What is the difference between varchar2 and nvarchar2? [message #689507 is a reply to message #689505] Thu, 25 January 2024 10:23 Go to previous messageGo to next message
mathguy
Messages: 107
Registered: January 2023
Senior Member
I don't know "MS SQL" (or "SQL Server", I assume they are the same thing; I don't know either), but a quick Google search seems to suggest that in SQL Server, VARCHAR is for ASCII characters, and for extended characters as supported by one code page or another. What is relevant here is that VARCHAR is not for Unicode text. By contrast, still in SQL Server, NVARCHAR (and similar: NCHAR, NTEXT) is for Unicode; so if you want to work with text in Arabic, Hebrew, Chinese etc. you would always use NVARCHAR, never VARCHAR (in SQL Server).

In Oracle: First, Oracle made the idiotic choice that NULL and empty string are the same thing. Since this contradicts the SQL Standard (and it offends civilized taste), they called their data type VARCHAR2 - notice the 2 at the end. Similar: NVARCHAR2. For the time being (and possibly till pigs will fly), VARCHAR is a synonym to VARCHAR2, but the Oracle documentation states repeatedly that one should only use VARCHAR2, since VARCHAR may in the future be used for something else. The documentation doesn't say what, but everyone assumes "a data type where an empty string is not NULL".

Then: when you create a database (from scratch), you decide what character set and what encoding will be used for VARCHAR2, and separately for NVARCHAR2. I don't know the history, but I believe VARCHAR2 existed for some time (Oracle database first came out in 1979) before Unicode was even invented (in 1991). When Unicode became the right solution for supporting text in languages that don't use the Latin alphabet, Oracle introduced NVARCHAR2 - no doubt so that, at least in the first few versions, people could continue to use the 256-character code pages for VARCHAR2, but also have a solution for Unicode in NVARCHAR2. Later still, the default character set for new versions of Oracle became Unicode, so in theory there should be no need for NVARCHAR2.

This explains the difference between SQL Server and Oracle. In SQL Server, if you use VARCHAR for Chinese text, you will get all question marks, you must use NVARCHAR for such text. You don't have an option. In Oracle, if the "database" (VARCHAR2) and the "national" (NVARCHAR2) character sets are both Unicode, you can use either.

You asked about the difference between VARCHAR2 and NVARCHAR2 in Oracle. Even in the most current versions of Oracle, you can set something like CP-1252 (the default Windows character set for English installations), or even CP-437 (the older character set used by DOS, if you even know what that means), as the "database" character set, and use NVARCHAR2 for "foreign language" text. Then the db would behave the same way as SQL Server. The reason you might want to do something like this (even though I assume very few people do, these days) is efficiency. If a very large fraction of your business is in Hungarian, or in Icelandic, you can use the respective one-byte character set / code page for that language (one of the Windows code pages). If your business is, for example, publishing books in Hungarian, most of your data may be large CLOB chunks. Using a one-byte character set for that has many benefits. Not only "less storage" (that is probably the smallest benefit), but also the time to read from disk and to write to disk is cut in half; even more important when you back up to even slower media, like tape (if that is even done anymore). Processing too - looking for occurrences of a substring in a string (in the text of a book, to create an Index, for example) - faster on one-byte encodings.

Even if you use Unicode for both VARCHAR2 and NVARCHAR2, it may make sense to take advantage of the different encoding: UTF-8 vs UTF-16. If most of your text is English (say), UTF-8 will be more efficient, since ASCII characters in UTF-8 are stored in single bytes. Then use UTF-8 as the database "character set" and UTF-16 as the national "character set" (for text in other languages, including Arabic, Chinese etc.). By the way, I put "character set" in quotation marks, because the term is used incorrectly in this context; UTF-8 and UTF-16 are different encodings of the same character set, Unicode. On the other hand, if most of your text is in Chinese, for example, you will probably want UTF-16 as your database "character set" (used by VARCHAR2), since UTF-16 encodes most characters in 2 bytes, while non-ASCII characters in UTF-8 may take up to 4 bytes. You already saw the differences in your own tests.
Re: What is the difference between varchar2 and nvarchar2? [message #689511 is a reply to message #689507] Fri, 26 January 2024 12:09 Go to previous messageGo to next message
Darth Waiter
Messages: 78
Registered: October 2020
Member
mathguy wrote on Thu, 25 January 2024 10:23
I don't know "MS SQL"
To me, your essay sounds so AI-generated, due to its vocabulary, tone, and structure.
Anyhow, from what I gather here and elsewhere, VARCHAR2 is sufficient, and NVARCHAR2 is overkill.
I only wonder why, for Chinese characters, the difference is 1 1/2 times, not 2 times, but I probably do not have to worry about that.

mathguy wrote on Thu, 25 January 2024 10:23
The documentation doesn't say what
It does, now.

[Updated on: Fri, 26 January 2024 12:10]

Report message to a moderator

Re: What is the difference between varchar2 and nvarchar2? [message #689512 is a reply to message #689511] Fri, 26 January 2024 13:22 Go to previous messageGo to next message
Michel Cadot
Messages: 68647
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

Quote:
I only wonder why, for Chinese characters, the difference is 1 1/2 times, not 2 times,
It depends on your character sets (see Solomon's post).
Most likely you have AL32UTF8 and AL16UTF16 character sets and most common Unified Chinese characters are coded in 3 bytes in the former (in the binary format 1110xxxx, 10xxxxxx, 10xxxxxx) and 2 in the later (in the range 4E00-9FFF).
Execute the DUMP function and you will see how the values are coded.

Re: What is the difference between varchar2 and nvarchar2? [message #689513 is a reply to message #689512] Fri, 26 January 2024 14:00 Go to previous messageGo to next message
Darth Waiter
Messages: 78
Registered: October 2020
Member
Michel Cadot wrote on Fri, 26 January 2024 13:22

Quote:
I only wonder why, for Chinese characters, the difference is 1 1/2 times, not 2 times,
It depends on your character sets (see Solomon's post).
Most likely you have AL32UTF8 and AL16UTF16 character sets and most common Unified Chinese characters are coded in 3 bytes in the former (in the binary format 1110xxxx, 10xxxxxx, 10xxxxxx) and 2 in the later (in the range 4E00-9FFF).
Execute the DUMP function and you will see how the values are coded.

Wow! I learned something new today. Never thought that some alphabets require 3 bytes.

Also, something is wrong with email notifications from this forum, namely with this section of the HTML body section:

<tr class=3D"RowStyleB">
	<td colspan=3D"3">
		[ <a href=3D"//www.orafaq.com/forum/?t=3Dpost&reply_to=3D689512">Reply<=
/a> ][ <a href=3D"//www.orafaq.com/forum/?t=3Dpost&reply_to=3D689512&quot=
e=3Dtrue">Quote</a> ][ <a href=3D"//www.orafaq.com/forum/?t=3Drview&goto=3D=
689512#msg_689512">View Topic/Message</a> ][ <a href=3D"//www.orafaq.com/=
forum/?t=3Drview&th=3D208567">Unsubscribe from this topic</a> ]
	</td>
</tr>
</table>
</div>
</body></html>
The links are not clickable because they do not contain https protocol and colon before double slashes. Also, the equal sign may have been messed up. This is easy to reproduce if the body of the email is viewed as plain text and the HTML section is saved in a file and opened. The links will appear corrupt.

[Updated on: Fri, 26 January 2024 14:00]

Report message to a moderator

Re: What is the difference between varchar2 and nvarchar2? [message #689514 is a reply to message #689513] Fri, 26 January 2024 15:08 Go to previous messageGo to next message
Michel Cadot
Messages: 68647
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

Post the problem in the Suggestions & Feedback forum, Frank the webmaster and owner and developer of this site will see it.
You can also directly PM him.

[Updated on: Fri, 26 January 2024 15:09]

Report message to a moderator

Re: What is the difference between varchar2 and nvarchar2? [message #689515 is a reply to message #689514] Fri, 26 January 2024 16:15 Go to previous messageGo to next message
Darth Waiter
Messages: 78
Registered: October 2020
Member
Also, it is fascinating that in varchar2 the record #4 from my example above is 130 bytes long, whereas it is only 92 bytes in nvarchar2.
So, nvarchar2 stores more bytes for some locales, but fewer bytes for others. Is there a logical explanation for that?

[Updated on: Fri, 26 January 2024 16:18]

Report message to a moderator

Re: What is the difference between varchar2 and nvarchar2? [message #689517 is a reply to message #689515] Fri, 26 January 2024 23:45 Go to previous messageGo to next message
mathguy
Messages: 107
Registered: January 2023
Senior Member
Wow! I learned something new today. Never thought that some alphabets require 3 bytes.

Not quite; even in the same alphabet (and in the same UTF-8 encoding), some characters may be encoded in one byte and some in two; or some in two and some in three.

It just happens that in your Chinese example, all characters require 3 bytes in UTF-8.

But if you look at the Devanagari example:

Also, it is fascinating that in varchar2 the record #4 from my example above is 130 bytes long, whereas it is only 92 bytes in nvarchar2.

There are 46 characters (either use the LENGTH function, or work back from 92 bytes in UTF-16, which uses 2 bytes for most characters regardless of alphabet). In UTF-8 the same text requires 130 bytes; this means that 38 of the characters require 3 bytes in UTF-8, while the other 8 characters require only 2 bytes. EDIT Actually, using the DUMP function shows that something else is happening in this case. All the Devanagari characters are actually 3 bytes; but there are four SPACES in the text, and those are one character each. So, 42 * 3 + 4 * 1 = 130. END EDIT

You seem surprised by several aspects of UTF-8 in particular; if you are going to use it, especially for multiple alphabets, you may benefit from reading more about it. It's the default encoding for new Oracle db installations (Oracle calls it "character set", incorrectly - and many practitioners just follow Oracle in that usage), so being familiar with it may have unexpected benefits.

So, nvarchar2 stores more bytes for some locales, but fewer bytes for others. Is there a logical explanation for that?

Yes, there is. The UTF-16 encoding of Unicode uses 2 bytes for most characters; there are wider characters, but those are generally Emoji and such, not "legitimate script". By contrast, the UTF-8 encoding uses only 1 byte for ASCII characters, but generally between 2 and 4 bytes for other characters. The medieval German text is 42 characters long, most of which are ASCII. The UTF-16 encoding requires 84 bytes (simple math, 42 * 2). The UTF-8 encoding requires 44 bytes; and there are exactly 2 accented characters. Trivial guess: each is encoded in 2 bytes. With a bit of arithmetic, you can "play detective" with your examples even without using the DUMP function (which was recommended several times).

To me, your essay sounds so AI-generated, due to its vocabulary, tone, and structure.

I am not surprised; I have no doubt that AI engines are plagiarizing my answers (and those of others, I don't think I am "special" in any way), on this and other discussion boards.

[Updated on: Sat, 27 January 2024 20:39]

Report message to a moderator

Re: What is the difference between varchar2 and nvarchar2? [message #689518 is a reply to message #689517] Sat, 27 January 2024 20:51 Go to previous message
mathguy
Messages: 107
Registered: January 2023
Senior Member
Here's one way to use DUMP to look at the encoding of individual characters in one of your strings. DUMP shows the data type (1 for strings), the length in bytes - note that in UTF16 the length is always 2; in UTF8 the byte length of non-accented characters is 1 while only the accented characters have length 2.

with
  input (clm) as (
    select 'Sîne klâwen durh die wolken sint geslagen' from dual
  )
, c (ch) as (
    select  substr(clm, level, 1)
    from    input
    connect by level <= length(clm)
  )
select ch, dump(ch, 16) utf8, dump(to_nchar(ch), 16) utf16
from   c
;


CH   UTF8                 UTF16               
---- -------------------- --------------------
S    Typ=1 Len=1: 53      Typ=1 Len=2: 0,53   
î    Typ=1 Len=2: c3,ae   Typ=1 Len=2: 0,ee   
n    Typ=1 Len=1: 6e      Typ=1 Len=2: 0,6e   
e    Typ=1 Len=1: 65      Typ=1 Len=2: 0,65   
     Typ=1 Len=1: 20      Typ=1 Len=2: 0,20   
k    Typ=1 Len=1: 6b      Typ=1 Len=2: 0,6b   
l    Typ=1 Len=1: 6c      Typ=1 Len=2: 0,6c   
â    Typ=1 Len=2: c3,a2   Typ=1 Len=2: 0,e2   
w    Typ=1 Len=1: 77      Typ=1 Len=2: 0,77   
e    Typ=1 Len=1: 65      Typ=1 Len=2: 0,65   
n    Typ=1 Len=1: 6e      Typ=1 Len=2: 0,6e
------------------------------------------- 
----- ETC ---------------------------------
Previous Topic: DBMS_OUTPUT how to include a white space
Next Topic: Evaluating a sys_context call in a predicate
Goto Forum:
  


Current Time: Sat Apr 27 18:53:20 CDT 2024