Save web page as txt - no longer strips html/code (Linux)

Discussion of general topics about Mozilla Firefox
User avatar
Nettkrawler
Posts: 137
Joined: May 3rd, 2010, 3:31 pm

Save web page as txt - no longer strips html/code (Linux)

Post by Nettkrawler »

Hi.

This thread ends in a very weird way, It kind of boils down to what I think is the real problem, but I still appreciate any ideas and suggestions if you have other ideas.


My current system:
Firefox Quantum 64.0 (64-bit), Mozilla Firefox for Ubuntu, Canonical 1.0
OS: Linux Lite 4.2 (Ubuntu 18.04)
HW: Anchient laptop "Dell Latitude D620", upgraded to 3GB RAM, otherwise all the original components (yes, it's actualy good enough for web browsing)

Thing is
I have the habit of storing web pages as local text file. This is a way that have worked well for me - because I can do very fast file search, and every url link is also preserved and therefore searchable - as well as other things that I benefit from, including the fact that txt files consumes very little space.

Problem is
Whenever I save a web page, and choose text (instead of HTML, HTML full page, etc) - and even if I manually give the file name a ".txt" ending (in the "file save as" dialog box), the end result is excactly the same as If I choosed "Web page, complete" - except that the file extension is txt.

Expected result: A local txt file, stripped for all html tags and script lines. No other files or folders in same directory.

Actual result: The file is saved with the filename that I manually gives it. When opening the text file in a text editor, it have all the html tags still there.
And there is a folder with same name as the files that holds some scripts and image files.


Really dumb thing happens right now
While typing the above text, I was having a try just to verify the naming scheme of the alongside folder to a saved file. But - suddenly I cannot reproduce the behaviour, problem seems to be solved by itself.

A little bit confused right now, but when thinking about it, there is one strange common thing with another problem I have had - it is that just some minutes after I observed the problem, the laptop (old Dell Latitude D620) cut power, probably because of overheating (too close a hot owen).
So the theory here is that back in windows xp days, the drivers had a threshold for maximum operating temperature on cpu (or some other temp sensor onboard) while Ubuntu allows the CPU to heat up until the hardware protection circuits kick in and kill the system - and the system will malfunction slightly minutes before this occurs.

The other things I wonder may cause the non-reproduceable behaviour is (I don't know this, just gut feeling):
  1. The themes for this specific distro. Yes - it have some 50'ish different color themes, and in some of the themes different input fields in various dialog boxes seems to operate differently, or in some cases not be able to select.
  2. If Firefox itself is somehow aware on the location the files is saved at. That is - would it behave different if I choose to save in "download" folder instead of "pictures" folder.
Last edited by DanRaisch on March 12th, 2019, 1:34 pm, edited 1 time in total.
Reason: (Linux) added to subject line.
User avatar
Nettkrawler
Posts: 137
Joined: May 3rd, 2010, 3:31 pm

Re: Save web page as txt - no longer strips html/code

Post by Nettkrawler »

UPDATE : Problem still persist - problem seems to be tied to one user profile (os user) and the other user doesn't have that problem at all.
For reference, say User1 all is ok, but for User2 - the text saving problem persists.

Ok - so there should not be Firefox itself that cause the problem, Have to see what it is.

Here is the things that I have tried - none of those measurements made any difference.
- Maybe the naming scheme had something to do, so I tried very short file names too (no spaces), as well as to save into /home/User2/pictures.
- A different web page
- Disable Noscript.
- Different order in which I manually type ".txt" and when select "text files" in the drop down menu in save-as dialog box.

Next thing to try is to change theme for User2. Current style is "Greybird".


[edit]
User1 had appearance/style set at "Adapta", so for User2 I changed it's theme to "Adapta". No difference.

Next to test - To wait a quarter to see if overheating is the direct cause.


[edit 2]
No - wasn't the heat either, problem still persist for User2


[edit 3]
User1 had uBlock original added to Firefox, while User2 had not. So I added uBlock origin to User2 as well.
Result : No change - but now, it turns out that download manager tend to flag some text downloads as "failed". But if I doesn't remove the file from download manager, the file is actually downloaded - but of course (sadly) still formatted as HTML (full webpage download).

This is frustrating now because I cannot spot one single setting that will cause Firefox to not download webpage into proper text format.

This leaves me with only 2 usable workarounds:
- Have a text editor (Geany) to strip the HTML file (but named .txt) but Have to figure out excactly how.
- Use User1, and save any files to a location where both users have access to (done that, so that is the easiest way out for now).

Still hope some of your guys have some clever ideas - what can I try next?
User avatar
therube
Posts: 21714
Joined: March 10th, 2004, 9:59 pm
Location: Maryland USA

Re: Save web page as txt - no longer strips html/code

Post by therube »

And if you try Safe Mode ?
Fire 750, bring back 250.
Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.19) Gecko/20110420 SeaMonkey/2.0.14 Pinball CopyURL+ FetchTextURL FlashGot NoScript
User avatar
Grumpus
Posts: 13246
Joined: October 19th, 2007, 4:23 am
Location: ... Da' Swamp

Re: Save web page as txt - no longer strips html/code

Post by Grumpus »

@Nightcrawler - Open your system monitor and watch when certain functions and software is being used.
Watch the Processes tab for activity and also look at the File Systems tabs.
Also watch resources (the center graph) when issues arise.
This is suggested for location of what is causing heat increase.

I tried the same text save and the html is there as well so this may be default, consider a person editing the html in need of correction
Doesn't matter what you say, it's wrong for a toaster to walk around the house and talk to you
User avatar
Nettkrawler
Posts: 137
Joined: May 3rd, 2010, 3:31 pm

Re: Save web page as txt - no longer strips html/code

Post by Nettkrawler »

therube - I tried safe mode (user2), same behavior.

Grumpus - I cannot find that function on my system.

This time I used Terminal to open Firefox, because it's the fastest way - getting this output:

Copy / paste from Terminal - User2

Code: Select all

Welcome to Linux Lite 4.2 user2
 
tirsdag 05 mars 2019, 18:33:46
Memory Usage: 516/2992MB (17.25%)
Disk Usage: 6/13GB (54%)
Support - https://www.linuxliteos.com/forums/ (Right click, Open Link)
 
 user2  ~  firefox

(firefox:7620): Gtk-WARNING **: 18:33:51.250: Theme parsing error: <data>:1:34: Expected ')' in color definition

(firefox:7620): Gtk-WARNING **: 18:33:51.250: Theme parsing error: <data>:1:76: Expected ')' in color definition
 user2  ~  firefox -safe-mode

(firefox:7865): Gtk-WARNING **: 18:34:23.845: Theme parsing error: <data>:1:34: Expected ')' in color definition

(firefox:7865): Gtk-WARNING **: 18:34:23.845: Theme parsing error: <data>:1:76: Expected ')' in color definition

###!!! [Parent][MessageChannel] Error: (msgtype=0x190084,name=PBrowser::Msg_Destroy) Closed channel: cannot send/recv


###!!! [Child][MessageChannel] Error: (msgtype=0x300112,name=PContent::Msg_DetachBrowsingContext) Closed channel: cannot send/recv

 user2  ~  
I'll try to see if same thing happens when User1 is logged on.


[Edit]
I see I get the excact same output as user1, so I assume this have nothing to do with the behavior

Copy / paste from Terminal - User1

Code: Select all

Welcome to Linux Lite 4.2 user1
 
tirsdag 05 mars 2019, 18:44:23
Memory Usage: 373/2992MB (12.47%)
Disk Usage: 6/13GB (54%)
Support - https://www.linuxliteos.com/forums/ (Right click, Open Link)
 
 user1  ~  firefox

(firefox:8571): Gtk-WARNING **: 18:44:31.067: Theme parsing error: <data>:1:34: Expected ')' in color definition

(firefox:8571): Gtk-WARNING **: 18:44:31.067: Theme parsing error: <data>:1:76: Expected ')' in color definition
 user1  ~  
User avatar
therube
Posts: 21714
Joined: March 10th, 2004, 9:59 pm
Location: Maryland USA

Re: Save web page as txt - no longer strips html/code

Post by therube »

Does this happen when saving any web page or only select?

Compare your prefs.js between the two users & see if anything stands out?


Did you actually look at the save output?
In the case of https://www.linuxliteos.com/forums/ it may save (by default) with a .htm extension (even when you've selected .txt as the output format), but what you are choosing in the name is only a "name". That name may happen to be "Linux Lite Forums - Index.htm", but if you've selected .txt format as its output, the file is in fact in text format.


No "html" to be found, "Linux Lite Forums - Index.htm" ;-).

Code: Select all

2
Trackers
Google Adsense
Google Analytics

  * Forum Home <https://www.linuxliteos.com/forums/index.php>
  * Help <https://www.linuxliteos.com/manual>
  * Bugs <https://www.linuxliteos.com/forums/security-bug-fixes/>
  * Search <https://www.linuxliteos.com/forums/search/>
  * Bookmarks <https://www.linuxliteos.com/forums/?action=bookmarks>
  * Download <https://www.linuxliteos.com/download.php>
  * Donate <https://www.linuxliteos.com/donate.html>
  * Shop <https://www.linuxliteos.com/shop.html>
  * Staff <https://www.linuxliteos.com/forums/?action=staff>
  * Login <https://www.linuxliteos.com/forums/login/>
  * Register <https://www.linuxliteos.com/forums/register/>

*
Linux Lite Forums <https://www.linuxliteos.com/>
Welcome, *Guest*. Please login
<https://www.linuxliteos.com/forums/?action=login> or register
<https://www.linuxliteos.com/forums/?action=register>.
Did you miss your activation email
<https://www.linuxliteos.com/forums/?action=activate>?

Login with username, password and session length


 



You are Here: 	

  * Linux Lite Forums <https://www.linuxliteos.com/forums/index.php>

	

*Linux Lite 4.4 RC1 has been released. See the Release Announcements
section for more information.*




        Recent Posts <https://www.linuxliteos.com/forums/recent/> Recent
        Posts

Linux Lite Forums - Recent Posts
Subscribe to Webslice
<https://www.linuxliteos.com/forums/.xml/?type=webslice>

*Re: Is there a way to access BIOS on a used Lenovo Ideapad G580 2689 ?
<https://www.linuxliteos.com/forums/installing-linux-lite/is-there-any-way-to-dual-boot-ll-on-a-used-lenovo-ideapad-g580-notebook/msg46201/?topicseen#msg46201>*
by DeepThought <https://www.linuxliteos.com/forums/profile/?u=7644>
(Installing Linux Lite
<https://www.linuxliteos.com/forums/installing-linux-lite/>)
    *Today* at 06:44:13 PM
*Re: Linux Lite 4.4 RC1 Released
<https://www.linuxliteos.com/forums/release-announcements/linux-lite-4-4-rc1-released/msg46200/?topicseen#msg46200>*
by Jerry <https://www.linuxliteos.com/forums/profile/?u=2> (Release
...
Fire 750, bring back 250.
Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.19) Gecko/20110420 SeaMonkey/2.0.14 Pinball CopyURL+ FetchTextURL FlashGot NoScript
User avatar
Grumpus
Posts: 13246
Joined: October 19th, 2007, 4:23 am
Location: ... Da' Swamp

Re: Save web page as txt - no longer strips html/code

Post by Grumpus »

System Monitor is normally a default install package.
Maybe Linux Lite doesn't include it.
I'd look in the repositories and install it as it gives a lot of information and allows overrides of software application which have run amok.

Try this link for a forum: Linux Lite Forums Index
Doesn't matter what you say, it's wrong for a toaster to walk around the house and talk to you
User avatar
Nettkrawler
Posts: 137
Joined: May 3rd, 2010, 3:31 pm

Re: Save web page as txt - no longer strips html/code

Post by Nettkrawler »

therube wrote:Does this happen when saving any web page or only select?
Yes - it doesn't matter what site I navigate to.
therube wrote:Compare your prefs.js between the two users & see if anything stands out?
Thansk, I shall do (and try to find a text comparison tool)
therube wrote:Did you actually look at the save output?
Yes I did - every single time. I can verify the saved file (sa User2) is indeed a html-file and the sidecar folders included.
For User1, it always save a real text file (no html tags inside nor sidecar files)
therube wrote:In the case of https://www.linuxliteos.com/forums/ it may save (by default) with a .htm extension (even when you've selected .txt as the output format), but what you are choosing in the name is only a "name". That name may happen to be "Linux Lite Forums - Index.htm", but if you've selected .txt format as its output, the file is in fact in text format.
That text is always presented when starting Terminal in that particular distro and is therefore not of any relevance for the issue (it looks the same for both user names, exept from the very name of the users of course) - it's just normal :D
therube wrote:No "html" to be found, "Linux Lite Forums - Index.htm" ;-).
Indeed - you pointed out the normal, expected behaviour.

I can try the suggestion about file comparison, and I figure if that doesn't lead to any solution, next step is to just wipe the Firefox user profile for User2.
User avatar
Nettkrawler
Posts: 137
Joined: May 3rd, 2010, 3:31 pm

Re: Save web page as txt - no longer strips html/code

Post by Nettkrawler »

Update - I ended up wiping the whole FF profile for User2, so I'm currently on a completely clean FF profile right now.

This time, Firefox wasn't able to download as text file. Strange enough, all the sidecar files is downloaded - that is same behavior as before, except the subject file did not download at all.
Image

This suggest it is not a problem to the Firefox profile itself.

This is indeed very weird, in fact no other programs on this computer have similar errors. But Firefox is the only web browser I know that is able to download a page and strips it for all html tags.

Do you happens to know about an extension that does that, as a workaround ?
User avatar
tanstaafl
Moderator
Posts: 49647
Joined: July 30th, 2003, 5:06 pm

Re: Save web page as txt - no longer strips html/code

Post by tanstaafl »

Try temporarily choosing "save as type" "web page, HTML only" and then select "save as type" "text files (*.txt)". I find I sometimes need to do that to get it to default to using a .txt file extension. If it defaults to using the .txt file extension it seems to reliably produce a text file with no HTML.
User avatar
Nettkrawler
Posts: 137
Joined: May 3rd, 2010, 3:31 pm

Re: Save web page as txt - no longer strips html/code

Post by Nettkrawler »

Tried that, tanstaafl.

I also found this cache folder:
/home/user2/.cache/mozilla/firefox/ntrc3s4e.default/

So Just to not have that untested, I tried to delete all contents - and afterwards, Firefox still cannot save web pages as untagged text files.

What else may there be? Can there be a problem with Firefox if the user profile is member of a custom group name in Linux ?
Guess this is a long shot - and if that is the case, then one should expect the problem would appear the same for both users.

Can Firefox have problems with user profiles (OS) with long passwords that also contain non-enlish characters?

Can there be a problem if the mime types is somehow incorrectly registered in OS? This is wild guess from my side as I haven't any experience regarding mime types.


[edit]

Found the mimeapps.list files for both user profiles (OS) and did a document comparison using Libre Office Writer. There is some differences (cannot upload exported pdf here to show), and the only thing that possible can be of interrest is thus line
text/csv=libreoffice-calc.desktop
that exist for user2 but not for user1.

user1 have this line, that user2 doesn't have
text/csv=leafpad.desktop;libreoffice-startcenter.desktop;

There are no differences regarding to HTML.

So I'll try to edit the mimeapps.list for user2 just to test if Firefox somehow doesn't like the mime-type as it sits now.


[edit 2]
Have done the change in the mimeapps.list for User2 that the text/csv - line is excactly as for User1.
Did it make any difference? No - no difference.

But - Firefox reports the file as "failed to download" (se schreenshot in above post).
User avatar
Grumpus
Posts: 13246
Joined: October 19th, 2007, 4:23 am
Location: ... Da' Swamp

Re: Save web page as txt - no longer strips html/code

Post by Grumpus »

Nettkrawler wrote:Can Firefox have problems with user profiles (OS) with long passwords that also contain non-enlish characters
A while back there were some security issues with a couple of Linux distros and the suggestion was to add some odd characters in the string. I have no idea whether it helps protect the password but it does seem to work without issue on a number of sites requiring sign in.
Doesn't matter what you say, it's wrong for a toaster to walk around the house and talk to you
User avatar
Nettkrawler
Posts: 137
Joined: May 3rd, 2010, 3:31 pm

Re: Save web page as txt - no longer strips html/code

Post by Nettkrawler »

Update.

I have just tried to log into a seldom used guest account (if someone is acutally askning for borrowing that computer, they're on a separate account that cannot interfer with any other users settings) - and the issue is affected for that user too.

So that means I have 3 user accounts on this computer, one account is not affected by the problem while the others are.

This is just so odd. I may try to just add another user profile on the computer, but honestly doesn't think it will solve the issue.


[edit]

Ok - I now tried to create a new user to the OS. Logged in as the new user, opened Firefox - navigated to a random wep page and tried to download page as text.

Result : Firefox failed to download that file, but the folder with all sidechar files did download seamingly fine (was not supposed to download that).

With this it seems that I can conclude that the OS user profile settings is not to blame. But then again it is so weird that the user1 is not affected - it just doesn't make any sense.

What may I have missed here? ](*,) :x
User avatar
Grumpus
Posts: 13246
Joined: October 19th, 2007, 4:23 am
Location: ... Da' Swamp

Re: Save web page as txt - no longer strips html/code

Post by Grumpus »

You shouldn't have gotten the associate files (images, etc.) for the site without the html, look in your home directory.
Doesn't matter what you say, it's wrong for a toaster to walk around the house and talk to you
User avatar
Nettkrawler
Posts: 137
Joined: May 3rd, 2010, 3:31 pm

Re: Save web page as txt - no longer strips html/code

Post by Nettkrawler »

From Synaptic, I appearently have concy installed, but I haven't figured if this is something I can actually call and run.

Here is a thing - begause I'm apparently stupid enough to just continue the quest of solving this hopeless issue.

I have had Memtest86+ running all night - like eight hours - until I turned it off this morning. No errors found.

Another thing I tested out : erasing all files inside Firefox user profile folder for User2. Then copy all files from Firefox user profile folder for User1 onto the Firefox profile folder for User2 - by that, User2 got the same settings as for user1. In theory (my theory) the problem should be fixed, User2 running Firefox with a file-by-file copy of the Firefox user profile for User1, should now be able to properly download a wep page as text file.

But no....

The fist thing that happens when User2 running Firefox with a copy of User1's Firefox profile folder is that at first glance it looks ok. But whenever clicking on a link, or enter any URL in address fied, then Firefox just goes stuck loading and never makes it to load any web page at all.

So I decided to run Firefox in safe mode. Then make it refresh the user profile.

New try. Result? Supprise supprise - the download behaviour of web page as local text files is still unchanged :head-bash-into-thick-concrete-wall:

In case of any interrest, I copied the Terminal output when running Firefox in safe mode:

Code: Select all

firefox -safe-mode
(firefox:9511): Gtk-WARNING **: 22:09:51.684: Theme parsing error: <data>:1:34: Expected ')' in color definition
(firefox:9511): Gtk-WARNING **: 22:09:51.684: Theme parsing error: <data>:1:76: Expected ')' in color definition
###!!! [Parent][MessageChannel] Error: (msgtype=0x190084,name=PBrowser::Msg_Destroy) Closed channel: cannot send/recv
###!!! [Child][MessageChannel] Error: (msgtype=0x4C0005,name=PHttpChannel::Msg_Cancel) Closed channel: cannot send/recv
###!!! [Child][MessageChannel] Error: (msgtype=0x4C0021,name=PHttpChannel::Msg_SetPriority) Closed channel: cannot send/recv
###!!! [Child][MessageChannel] Error: (msgtype=0x190001,name=PBrowser::Msg_AsyncMessage) Closed channel: cannot send/recv
###!!! [Child][MessageChannel] Error: (msgtype=0x300112,name=PContent::Msg_DetachBrowsingContext) Closed channel: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x190084,name=PBrowser::Msg_Destroy) Closed channel: cannot send/recv
###!!! [Child][MessageChannel] Error: (msgtype=0x300112,name=PContent::Msg_DetachBrowsingContext) Closed channel: cannot send/recv

Code: Select all

firefox -safe-mode
(firefox:9768): Gtk-WARNING **: 22:11:56.833: Theme parsing error: <data>:1:34: Expected ')' in color definition
(firefox:9768): Gtk-WARNING **: 22:11:56.833: Theme parsing error: <data>:1:76: Expected ')' in color definition
(firefox:9768): Gtk-WARNING **: 22:12:09.311: Theme parsing error: <data>:1:34: Expected ')' in color definition
(firefox:9768): Gtk-WARNING **: 22:12:09.311: Theme parsing error: <data>:1:76: Expected ')' in color definition
[Parent 9768, Gecko_IOThread] WARNING: pipe error (199): Connection reset by peer: file /build/firefox-wWf_B4/firefox-64.0+build3/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 363
Locked