Libname NAME ‘location of hardrive’
Ie/
libname hw2 'C:/temp';
file, import wizard
format is always a CSV format
Comma Separated Values
member-name it blah
Data cannot be open in something else
Most commands are called proc (procedure)
data=libname.filename;
Proc contents data=hw2.data;
Run;
proc means data=hw2.data;
run;
that means only on numerical data get the means
n = number on nonmissing values
very good to find errors
click running person or press F8
proc means data=hw2.data;
title "Question 1"; ----this titled the means page
*this is a comment; -- green means not part of code
run; -- when this is blue, command is ready to run.
proc means data=hw2.data;
title "Question 1";
*this is a comment;
var purch child; -- this is so only these variables will show up
run;
this copies data from data to data2
data hw2.data2;
set hw2.data;
run;
if you have a zip code, make it text. database fields
proc means data=libname.filenam mean skew;
run;
mean finds the average
skew finds how spread out they are
median - 50 % on each side
nw side of town - house in lawrence, doug compton , has huge mansion and giraffes zebras elephant!
\
\
ELEPHANT\____________________
FS folks? HYVEE
__________________________
6th st
wak
Data commands:
1. manipulate: modify variabled
- transform variable types - character to numeric
- create variable as fxn of other variables
- create indicator variables repping unique variables of a character variable (e.g. ks=1 when state =ks)
libname hw2 'e:/';
proc means data=hw2.blah;
run;
proc sort data=hw2.blah;
by state;
run;
proc means data=hw2.blah skew;
by state;
run;
this one does stuff
data blah.new;
set hw2.blah;
run;
this one makes a column at the end where x = 1
you could do like year=2009 and then everything in that file would be from that year. the name of the variable cant start w a number and shouldnt have spaces, bc that makes it confusing
data hw2.new;
set hw2.blah;
x=1;
run;
data hw2.new2;
set hw2.blah;
x2=2*x;
run;
if you put the same name in data (new one) and set (where u get info from)'
if you make them both the name of original data - it will overwrite it
this makes a variable in new 2, from new, and the var is r, the number is between 0 and 1. RANDOM NUMBERS.
data hw2.new2;
set hw2.new;
x=1;
x2=2*x;
r=ranuni (0);
run;
if statments
only pull out things that are Male
data hw2.new2;
set hw2.new;
x=1;
x2=2*x;
r=ranuni (0);
if gender='M';
run;
Make sure you close a data set before altering it. or it wont work.
if you only want 5000 random entries. of 50000 (10%)
generate a random number, select off that number.
data hw2.blah3;
set hw2.blah;
z=1;
x2=2*x;
r=ranuni (0);
if r<=.1; run; *this is telling it to select 10% of data, the 10% that was randomly assigned a number less than .1
newest file goes on top after data, then where u get info from
this makes a new file called newest, 10% of random sample is taken and if it's a female it says female=1. Male it says female =0
data hw2.newest;
set hw2.blah;
r=ranuni (0);
if r=<.1; if gender='F' female=1; if gender='M' female=0; run;
this makes a new file called newest, 10% of random sample is taken and if it's a female it says female=1. anything else it says female =0
data hw2.newest;
set hw2.blah;
r=ranuni (0);
if r=<.1; if gender='F' female=1; else female=0; run;
keep/drop - decide which things to keep in table
2 commands that may be useful in hw
freq - how often the variable occurs
for example - how many times each state
proc freq data=hw2.blah;
table state;
run;
MAKES A HISTOGRAM
proc univariate data=hw2.blah;
var state;
histogram state;
run;
proc ttest data=hw2.blah;
class gender;
var first;
run;
what u need in hw
answer: carefully and specifically
sas output:copy and paste relevant output you used to derive - you can abbrev output to only show relevant - get from output thingy
sas codes-used to get output
are men more familiar or not
the bigger the sample, easier to reject the null
proc ttest data=hw2.one;
class gender;
var first;
run;
t value is minimum alpha to reject the null.
im so hungry and not paying attn.
proc freq data=hw2.one;
CROSSTABS
proc freq data=hw2.one;
table gender*buyer;
run;
9-23
tuesday @5
he'll be in a chatroom
decision tree review
to do decision trees you have to use enterprise miner
solutions >analysis>enterprise miner (E)
file>new...> project
have to have already inputted data in sas first
look at 1st icon - drag to project view (input data source)
double click
select library and then file
will select random 2000 datapoints.. so you have to tell it to use all observations.
click variables. right click and select model role and reject ones you dont want to use
set buyer as target
select book categories as inputs
close sm window
drag tree to project space
click next to file, drag arrow to tree
double click on tree icon
click basic - use chi squared
significance level .2 is fine, unless you want a significance level to make tree stop
min level of observ in leaf - how many it has tyo have
max branches - binary=2
click advancced - model assessment measure - proportion of event in top 10 , how much sep should be bw buyer and non buyer
right click on tree, make it run. will make crosstabs etc
click around the optimal tree, this has 23.
view tree
...1 6...
less than 1 1-6
leaf statistics
right click and copy , edit copy , or save tree as gif
answer ?, show tree.
No comments:
Post a Comment