Correctly detect the encoding #4160

alexandernst · 2014-11-12T16:50:16Z

Atom should try to guess the encoding when opening a file as best as it can and only if it fails, fallback to the default encoding option (which was implemented a few days ago).

Missing libraries (that would presumably help with this issue) shouldn't be a factor here. If there isn't any library that could be used, one should be created from 0 or maybe patch the one that could get the job done with as little amount of work as possible.

The main issue here is that Atom will try to open files in UTF8 (or the default encoding, if set), which will potentially break any non-UTF8 files. And that, imho, is not an acceptable/valid behavior for any editor.

This should be tagged as 1.0 blocker.

Also, maybe this could help:

bizoo · 2014-12-23T13:23:27Z

IMHO, a highly desirable feature.
Like in SublimeText, an automatic detection of BOM for UTF files, or UTF8 without BOM (by trying to decode them as UTF8) is enough.
If it's not an UTF file, open it with default encoding.

IMHO, encoding detectors library lead to unpredictable result that is worst than the current behavior.

benogle · 2015-07-01T00:35:46Z

We could add an option to allow for auto-detection as the default as described in atom/encoding-selector#24

benogle · 2015-07-09T22:38:51Z

This would also be helpful for find/replace: atom/scandal#26

We'd need to bundle libicu which looks to be about ~18meg.

rugk · 2015-08-09T14:10:03Z

Also note my problems with ANSI encoding. They should fit in this category.

rogeriopradoj · 2015-12-08T20:10:22Z

Same problem for me here. When Microsoft SQL Server generates scripts from database, it creates in Unicode but UTF-16.

When sublime tries to open it in UTF-8 it gets like this:

opera��o

... what should be:

operação

It is the same if if try to create files in ANSI.

andresmendes · 2015-12-16T21:13:22Z

I use files in UTF-8 and windows-1252.
My default character set encoding is UTF-8.
I use the package auto-encoding to change automatically the encoding.

When I open a windows-1252 file with the words "aço" and "papelão" Atom changes to the right encoding (windows-1252) and the result is perfect:

aço papelão

But when I open a windows-1252 file with the word "operação" (similar to what @rogeriopradoj did) Atom changes the encoding to windows1251 and exhibits:

operaзгo

When the word is "operações" Atom changes the encoding to ibm855 and exhibits:

operaушes

Two special characters side by side (like çã or çõ) seems to confuse the auto encoding.

LeonBlade · 2015-12-16T21:44:15Z

I might as well leave this here but I had a problem with saving a UTF-16 LE file and it defaulted to binary encoding when saving it and wouldn't load properly because of it. I had to save the file with another text editor just to fix the file.

4llan · 2016-04-18T14:05:45Z

Sometimes I have to work with files with Western (ISO-8859-1) encoding.

Open the file: Atom select UTF-8 by default (� everywhere)
"Auto Detect" change the encoding to Windows 1251 (words like inúmeras and básico goes inъmeras and bбsico)

Would be nice if Atom has a smart detection of encoding (never had this problem in Sublime) and default encoding can be set to "Auto Detect" (atom/encoding-selector#24)

damieng · 2016-04-23T18:11:10Z

Tracking this feature request under the package responsible for it atom/encoding-selector#24

lock · 2018-04-09T06:36:02Z

This issue has been automatically locked since there has not been any recent activity after it was closed. If you can still reproduce this issue in Safe Mode then please open a new issue and fill out the entire issue template to ensure that we have enough information to address your issue. Thanks!

alexandernst mentioned this issue Nov 12, 2014

Auto-detect encoding on an already open file does not properly detect in all cases atom/encoding-selector#8

Open

lee-dohm mentioned this issue Dec 17, 2014

Auto Detect Encoding on open file from FTP atom/encoding-selector#13

Closed

lee-dohm mentioned this issue Feb 12, 2015

UTF-16LE file is opened with UTF-8 and only 7-bit ASCII characters are correctly decoded #5516

Closed

izuzak mentioned this issue May 23, 2015

Critical: Opening a file with command line arguments ends up in wrong encodings #6914

Closed

izuzak added the enhancement label May 23, 2015

hebbet mentioned this issue Jun 30, 2015

can i set auto-detect encoding as default #7547

Closed

thomasjo mentioned this issue Jul 8, 2015

Auto detect as file encoding #7785

Closed

mnquintana mentioned this issue Aug 9, 2015

ANSI encoding? #8290

Closed

50Wliu mentioned this issue Dec 24, 2015

UTF-16 may not be properly detected #10177

Closed

fnurl mentioned this issue Mar 23, 2016

The context of the "Default encoding" should be "new files" with another setting for opening files. #11255

Closed

damieng added the encoding label Apr 23, 2016

damieng closed this as completed Apr 23, 2016

lock bot locked and limited conversation to collaborators Apr 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correctly detect the encoding #4160

Correctly detect the encoding #4160

alexandernst commented Nov 12, 2014

bizoo commented Dec 23, 2014

benogle commented Jul 1, 2015

benogle commented Jul 9, 2015

rugk commented Aug 9, 2015

rogeriopradoj commented Dec 8, 2015

andresmendes commented Dec 16, 2015

LeonBlade commented Dec 16, 2015

4llan commented Apr 18, 2016

damieng commented Apr 23, 2016

lock bot commented Apr 9, 2018

Correctly detect the encoding #4160

Correctly detect the encoding #4160

Comments

alexandernst commented Nov 12, 2014

bizoo commented Dec 23, 2014

benogle commented Jul 1, 2015

benogle commented Jul 9, 2015

rugk commented Aug 9, 2015

rogeriopradoj commented Dec 8, 2015

andresmendes commented Dec 16, 2015

LeonBlade commented Dec 16, 2015

4llan commented Apr 18, 2016

damieng commented Apr 23, 2016

lock bot commented Apr 9, 2018