Backdoor in upstream xz/liblzma leading to ssh server compromise

e0qdk@reddthat.com · 21 days ago

I wrote something like this before for academic researchers to load data sets on display walls by using their cellphones. I approached it by building a simple website. When the user logs in, they’d see a table of entries (from a directory listing on a shared file server that they could drop their data sets onto) and could click a button that made a form post to the server which caused it to run whichever programs were needed to load the data set they wanted (or run a couple of other handy commands – like turning the monitors on/off, etc).

You can do something like that too in Python if you want:

Learn how to start and stop programs from Python scripts. This can be done with the built-in subprocess library. If you know how to launch the programs you want from the command line, it shouldn’t be too hard to figure out how to do it from Python by reading the documentation. It will take some more effort to figure out how to interact with it (e.g. to stop it from user input) without blocking your script, but this can be done.
Learn how to write a simple program that can respond to HTTP requests in Python. There are a number of libraries like tornado, flask, cherrypy, etc. that can do this. Pick one, read the documentation, and write a tiny page that allows you to submit a form and then trigger an action on the server in response to an HTTP POST. You should be able to interact with it by pointing the browser on your computer to localhost (possibly plus a port) or from on your LAN by putting the IP of your computer into the address bar.
Figure out how you’re going to organize the entries you want to be able to load. You could just do something trivial like putting the files in known folders and running os.listdir, or something more involved like tracking the entries with a spreadsheet or database or JSON file that lets you associate custom metadata with each entry (like a custom name to show or an icon to display or when it was last launched, etc.)
Generate a web page based on that data collection. I recommend using templating – e.g. with mustache, or jinja, etc. Basically you write some HTML-like text that lets you indicate places to fill in data from your program and it will do the conversion of symbols like < into < that are needed for HTML output and also repeat patterns using entries from lists you provide to build the rows of tables and such for you.
Set up some security (e.g. a simple log in system) and polish it up as much as you care to do.

Good luck and have fun!

e0qdk@reddthat.com · 3 months ago

I was curious, so I did some searches on this topic for you and found these pages:

The second link in particular notes:

The reason that things are much easier with all ASCII data is that practically every Unicode encoding in existence maps bytes 0x00…0x7f to the corresponding code points, so byte strings and Unicode strings that contain the same all-ASCII data are basically equivalent, even semantically. What usually trips people up with non-ASCII data is that the semantic meaning of bytes in the range 0x80…0xff changes from one encoding to another.

But, thinking like a systems programmer again, for many purposes the semantic meaning of bytes 0x80…0xff doesn’t matter. All that matters is that those bytes are preserved unchanged by whatever operations are done. Typical operations like tokenizing strings, looking for markers indicating particular types of data, etc. only need to care about the meaning of bytes in the range 0x00…0x7f; bytes in the range 0x80…0xff are just along for the ride.

So the trick for beating Python 3 strings into submission is to put in encoding and decoding calls where you need to, choosing a single-byte encoding that doesn’t mutate 0x80…0xff. There are many of these; most of the Latin-{1…6} sequence (aka ISO-8859-1…10) is has this property. What you do not want to do is pick utf-8 or any of the multibyte Asian encodings. Latin-1 will do fine; in fact it has an advantage over the others in memory consumption, which we’ll describe below.

Whether depending on this is actually correct or not is beyond me, but it seems like people have actually been using that pass-through behavior in practice and put it into things like Python2 -> 3 migration guides.

The first link suggests that the seemingly undefined ranges are valid as C0 and C1 control codes which may be why it doesn’t throw errors.

e0qdk@reddthat.com · 4 months ago

That requires turning every read into a write – which is slow/expensive generally. (That might not matter much for Google – who try to record everything you ever do already, basically – but it matters for everyone else.)

Also, it tends to promote spam and offensive niche content. kbin’s got a sidebar that tries to promote random low activity communities and posts, for example, and it’s almost uncanny how much crap it pushes up…

e0qdk@reddthat.com · 5 months ago

My old username from reddit and HN was already taken and I couldn’t think of anything else I wanted to be called so I just picked some random characters like this:
>>> import random
>>> ''.join([random.choice("abcdefghijklmnopqrstuvwxyz0123456789") for x in range(5)])
'e0qdk'

I have that literally in my kbin profile, but it’s not on my reddthat one. (I think I tried to copy it there originally when I set up the account but ran into some issue with Lemmy’s UI – been long enough that I forget what exactly.)

e0qdk@reddthat.com · edit-2 5 months ago

I spent a while looking thanks to your post and only found stuff from 2022 as well. My Chinese is basically non-existent though. (I can pick out a word here and there from knowing some Japanese, but that’s about it.) Someone who knows Chinese might have better luck digging.

I did find this file from 2022 (14999x6982 – ⚠️ 100+ MB PNG): https://upload.wikimedia.org/wikipedia/commons/8/8c/The_geologic_map_of_the_Moon_at_1-2.5M_scale.png

Associated information (and preview): https://commons.wikimedia.org/wiki/File:The_geologic_map_of_the_Moon_at_1-2.5M_scale.png

I assume that’s the one you’re referring to from 2022?

All the news stories just have low-res previews.

Is there a preview that looks different from this? I don’t see a preview at all (just a picture of people at some sort of presentation) in your link – but my browser might just not be loading it if there is one. (I generally block scripts.)

Edit: tweaked wording slightly

e0qdk@reddthat.com · 5 months ago

I think this is just using SpeechDispatcher from the system – so it’s not a Firefox specific thing. I get a similar (but very slightly different) voice on my own system by default – which matches what I get when I run a command like spd-say --wait "Hello world" from the command line.

I’m pretty sure SpeechDispatcher can be configured to use a different synthesis engine – Arch’s wiki has some suggestions: https://wiki.archlinux.org/title/Speech_dispatcher – but I haven’t dug into it yet.

e0qdk@reddthat.com · 5 months ago

Maybe you’d be interested in “kinetic novels”? They’re basically VNs without choices.

e0qdk@reddthat.com · 5 months ago

e0qdk@reddthat.com · edit-2 5 months ago

The Japanese text on the bottom of the left image says: Sapporo (Draft) Black Label beer. I can’t tell what the four characters under 生 are though. (Too blurry for me to figure out.)

Edit: those characters might be 非熱処理 – meaning unpasteurized.

e0qdk@reddthat.com · 6 months ago

Additional relevant discussion on HN: https://news.ycombinator.com/item?id=39865810

e0qdk@reddthat.com · 6 months ago

Backdoor in upstream xz/liblzma leading to ssh server compromise

e0qdk@reddthat.com · 6 months ago

It’s often in comments in the JS file(s) – sometimes with the licenses quoted entirely and sometimes in a form abbreviated by an automated code packer. Probably a lot of sites aren’t actually compliant with the terms of the licenses doing things that way, but IANAL.